In-Warehouse Inference on Snowflake and BigQuery

Outcome focus: Compared Snowflake and BigQuery patterns for scheduled in-warehouse inference, corrected CDC assumptions, and defined monitoring, grants, and deployment checks.

Not every model needs a real-time API.

Sometimes the cleanest deployment is a scored table.

The data is already in the warehouse. The downstream system already reads from the warehouse. The decision happens daily, hourly, or every few minutes. The model is small enough, stable enough, and batch-friendly enough that moving data out to an online service only adds cost and failure modes.

That is where in-warehouse inference is useful.

Score close to the data.

Write results back to a table.

Let the rest of the business consume the score through the warehouse, CRM sync, reverse ETL, dashboard, or operational workflow.

This post covers two practical patterns:

Snowflake scoring with Streams, Tasks, and Python UDFs.
BigQuery scoring with scheduled queries, change history or watermarks, and BigQuery ML or Vertex AI remote models.

The goal is not to pretend warehouses replace model serving platforms.

The goal is to use the right boundary.

When in-warehouse inference fits#

Use warehouse inference when:

scoring can run in batch or micro-batch
the source data is already in the warehouse
the model input is tabular or SQL-friendly
latency requirements are minutes or hours, not milliseconds
downstream consumers want tables
governance is easier inside the warehouse boundary
the model artifact is small enough or hosted behind a warehouse-accessible prediction interface

Avoid it when:

the decision needs low-latency online inference
the model is too large or slow for UDF execution
request-time context lives outside the warehouse
the feature path differs from the warehouse data path
the model requires GPUs or custom runtime behavior
per-row calls would create unacceptable cost or latency

The warehouse pattern is an operating choice, not a badge of maturity.

The corrected Snowflake shape#

The draft pattern people often write is:

Create a stream on a table, then query metadata columns from the base table.

That is wrong.

Snowflake stream metadata columns such as METADATA$ACTION, METADATA$ISUPDATE, and METADATA$ROW_ID are returned when you query the stream object, not the base table.

Snowflake documents that a stream returns the same columns as the source object plus these metadata columns:

METADATA$ACTION, with INSERT or DELETE
METADATA$ISUPDATE, indicating whether the row is part of an update pair
METADATA$ROW_ID, a unique immutable row id for tracking changes

Updates are represented as a DELETE and INSERT pair with METADATA$ISUPDATE = TRUE.

For scoring current rows, you usually want the INSERT side. That includes new inserts and the after-image of updates.

Snowflake example#

Assume a feature table:

CREATE OR REPLACE TABLE RAW_FEATURES (
  ID STRING,
  AGE FLOAT,
  RX_COST_30D FLOAT,
  VISITS_90D FLOAT,
  COPAY_VOLATILITY FLOAT,
  UPDATED_AT TIMESTAMP_NTZ
);

Create a scored table:

CREATE OR REPLACE TABLE SCORED_CUSTOMERS (
  ID STRING,
  PREDICTED_PROBABILITY FLOAT,
  SCORED_AT TIMESTAMP_NTZ,
  MODEL_VERSION STRING
);

Create the stream:

CREATE OR REPLACE STREAM RAW_FEATURES_STREAM
  ON TABLE RAW_FEATURES;

Create the Python UDF:

CREATE OR REPLACE FUNCTION PREDICT_PROBA(FEATURES ARRAY)
RETURNS FLOAT
LANGUAGE PYTHON
RUNTIME_VERSION = 3.12
PACKAGES = ('scikit-learn', 'joblib', 'numpy')
IMPORTS = ('@MODEL_STAGE/model.joblib')
HANDLER = 'handler'
AS $$
import sys
import joblib
import numpy as np
 
import_dir = sys._xoptions["snowflake_import_directory"]
model = joblib.load(open(import_dir + "/model.joblib", "rb"))
 
def handler(features):
    x = np.array([features], dtype=float)
    return float(model.predict_proba(x)[0, 1])
$$;

Then create a task that consumes the stream:

CREATE OR REPLACE TASK SCORE_CHANGED_CUSTOMERS
  WAREHOUSE = ML_WH
  WHEN SYSTEM$STREAM_HAS_DATA('RAW_FEATURES_STREAM')
AS
MERGE INTO SCORED_CUSTOMERS AS target
USING (
  SELECT
    ID,
    ARRAY_CONSTRUCT(
      AGE,
      RX_COST_30D,
      VISITS_90D,
      COPAY_VOLATILITY
    ) AS FEATURES
  FROM RAW_FEATURES_STREAM
  WHERE METADATA$ACTION = 'INSERT'
  QUALIFY ROW_NUMBER() OVER (
    PARTITION BY ID
    ORDER BY UPDATED_AT DESC
  ) = 1
) AS source
ON target.ID = source.ID
WHEN MATCHED THEN UPDATE SET
  target.PREDICTED_PROBABILITY = PREDICT_PROBA(source.FEATURES),
  target.SCORED_AT = CURRENT_TIMESTAMP(),
  target.MODEL_VERSION = 'risk_model_2026_04_27'
WHEN NOT MATCHED THEN INSERT (
  ID,
  PREDICTED_PROBABILITY,
  SCORED_AT,
  MODEL_VERSION
) VALUES (
  source.ID,
  PREDICT_PROBA(source.FEATURES),
  CURRENT_TIMESTAMP(),
  'risk_model_2026_04_27'
);

New Snowflake tasks start suspended, so resume the task:

ALTER TASK SCORE_CHANGED_CUSTOMERS RESUME;

The important correction is the source:

FROM RAW_FEATURES_STREAM

not:

FROM RAW_FEATURES

That is where the CDC metadata lives.

Snowflake operator notes#

Stage the model artifact:

PUT file://model.joblib @MODEL_STAGE AUTO_COMPRESS = FALSE;

Grant:

usage on warehouse
usage on database and schema
read on stage
usage on function
select on source table or stream as appropriate
ownership or operate privileges for task administration

Monitor:

task history
stream consumption
rows scored
UDF errors
warehouse credit usage
score distribution
model version distribution
calibration drift when labels arrive

Snowflake Python UDFs are convenient, but convenience is not free. Per-row UDF execution can become expensive or slow if the model is heavy or the row volume is large. For high-volume scoring, consider vectorized UDFs, Snowpark batch jobs, or scoring outside the warehouse and writing results back.

BigQuery options#

BigQuery has a different shape.

The main options are:

BigQuery ML models scored with ML.PREDICT.
Imported models supported by BigQuery ML, such as TensorFlow, ONNX, TensorFlow Lite, and XGBoost.
Remote models backed by Vertex AI endpoints, called through ML.PREDICT.
Scheduled queries for recurring scoring.
Change history functions for incremental processing, currently Preview.
A conservative watermark pattern when Preview change history is not appropriate.

The cleanest production pattern depends on where the model lives.

If the model can live in BigQuery ML, use ML.PREDICT directly.

If the model is a custom Python or large model hosted in Vertex AI, create a BigQuery remote model over the Vertex AI endpoint and call ML.PREDICT.

BigQuery remote model example#

BigQuery supports remote models over custom Vertex AI models using CREATE MODEL ... REMOTE WITH CONNECTION.

The docs emphasize that remote models are useful when a model is too large to import into BigQuery or when you want one inference endpoint for online, batch, and micro-batch use cases.

Example:

CREATE OR REPLACE MODEL `PROJECT_ID.ml.risk_remote_model`
INPUT (
  age FLOAT64,
  rx_cost_30d FLOAT64,
  visits_90d FLOAT64,
  copay_volatility FLOAT64
)
OUTPUT (
  predicted_probability FLOAT64
)
REMOTE WITH CONNECTION `PROJECT_ID.us.vertex_connection`
OPTIONS (
  ENDPOINT = 'https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID'
);

Score a table:

SELECT
  id,
  predicted_probability,
  CURRENT_TIMESTAMP() AS scored_at,
  'risk_model_2026_04_27' AS model_version
FROM ML.PREDICT(
  MODEL `PROJECT_ID.ml.risk_remote_model`,
  (
    SELECT
      id,
      age,
      rx_cost_30d,
      visits_90d,
      copay_volatility
    FROM `PROJECT_ID.features.raw_features`
  )
);

Then merge results:

MERGE `PROJECT_ID.ml.scored_customers` AS target
USING (
  SELECT
    id,
    predicted_probability,
    CURRENT_TIMESTAMP() AS scored_at,
    'risk_model_2026_04_27' AS model_version
  FROM ML.PREDICT(
    MODEL `PROJECT_ID.ml.risk_remote_model`,
    (
      SELECT
        id,
        age,
        rx_cost_30d,
        visits_90d,
        copay_volatility
      FROM `PROJECT_ID.features.raw_features`
      WHERE updated_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
    )
  )
) AS source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET
  predicted_probability = source.predicted_probability,
  scored_at = source.scored_at,
  model_version = source.model_version
WHEN NOT MATCHED THEN INSERT (
  id,
  predicted_probability,
  scored_at,
  model_version
) VALUES (
  source.id,
  source.predicted_probability,
  source.scored_at,
  source.model_version
);

Schedule this as a BigQuery scheduled query.

BigQuery scheduled queries are written in GoogleSQL and can include DDL and DML. The schedule time is converted to UTC, which matters when stakeholders expect local-time behavior.

BigQuery incremental choices#

There are two common incremental patterns.

The conservative pattern is a watermark:

WHERE updated_at > (
  SELECT COALESCE(MAX(scored_at), TIMESTAMP '1970-01-01')
  FROM `PROJECT_ID.ml.scored_customers`
)

This is simple, but it assumes updated_at is reliable and that late-arriving records are handled intentionally. In real systems, use a lookback window:

WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)

Then MERGE makes the operation idempotent.

The newer option is BigQuery change history.

BigQuery change history provides APPENDS and CHANGES functions. APPENDS returns rows appended during a time range. CHANGES returns rows changed during a time range, but the table must have enable_change_history set to TRUE. The feature is currently Preview, which means it belongs behind a deliberate platform decision, not casual copy-paste.

Use change history when:

the account accepts Preview feature risk
storage and compute costs are understood
changes include updates and deletes, not only appends
IAM and row or column access policies are compatible
the scoring workflow needs true changed-row processing

Use watermarks when:

the source table has reliable update timestamps
append-heavy behavior dominates
operational simplicity matters more than full CDC semantics

BigQuery operator notes#

For remote models:

create a BigQuery connection
grant the connection service account access to Vertex AI
match remote model input and output fields to the endpoint contract
keep region alignment in mind
monitor endpoint latency and BigQuery job cost

For scheduled queries:

enable BigQuery Data Transfer Service requirements
use a service account with least privilege
write idempotent MERGE statements
parameterize windows carefully
alert on failures

For scoring:

log model version
log scored timestamp
preserve input snapshot or feature hash
monitor row deltas
monitor score distribution
recompute Brier and log loss when labels arrive

Snowflake versus BigQuery#

Snowflake Streams and Tasks feel like a native CDC pipeline:

stream captures table changes
task triggers on stream data or schedule
Python UDF or Snowpark code scores
MERGE writes output

BigQuery feels more query orchestration oriented:

scheduled query runs SQL
ML.PREDICT scores BigQuery ML, imported, or remote Vertex AI models
change history can provide incremental deltas in Preview
watermark patterns remain practical and stable

Neither is universally better.

Snowflake is strong when stream-driven tasks and in-warehouse Python are the natural fit.

BigQuery is strong when BigQuery ML, scheduled SQL, and Vertex AI remote models match the architecture.

The decision should follow:

where the data lives
where the model lives
latency requirement
volume
governance boundary
cost model
operational ownership

The calibration connection#

In-warehouse inference does not remove the need for model controls.

If the score is a probability, monitor calibration.

At minimum:

score distribution by run
positive rate by probability bucket when labels arrive
Brier score
log loss
thresholded decision rate
segment calibration

The post Probability Calibration Is an Operating Control covers this side of the work. The warehouse runbook should carry that monitoring into production.

The warehouse can make scoring easier.

It does not make the score trustworthy by itself.

Failure modes#

The first Snowflake failure mode is querying the base table and expecting stream metadata columns.

Query the stream.

The second is treating updates as simple rows without understanding INSERT and DELETE pairs.

For current-state scoring, consume the INSERT side.

The third is using per-row Python UDFs for a model that should be batch-scored elsewhere.

Convenience can become a credit bill.

The first BigQuery failure mode is assuming CDC is identical to Snowflake streams.

It is not. BigQuery change history exists, but it is Preview. Watermarks are often the practical default.

The second is creating a remote model without respecting endpoint schema, region, and permissions.

The third is scheduling a query that is not idempotent.

If it runs twice, the output should still be correct.

The point#

In-warehouse inference is not a shortcut around MLOps.

It is a deployment pattern.

Use it when the decision is batch-friendly, the data already lives in the warehouse, and the score can be written back as a governed table.

Snowflake gives you Streams and Tasks.

BigQuery gives you Scheduled Queries, BigQuery ML, remote Vertex AI models, and change history where appropriate.

The engineering standard is the same in both:

Score only what changed when possible.

Make the scoring job idempotent.

Log model version and scored time.

Monitor row counts, failures, cost, and calibration.

Keep the score close to the decision it supports.

In-Warehouse Inference on Snowflake and BigQuery

When in-warehouse inference fits#

The corrected Snowflake shape#

Snowflake example#

Snowflake operator notes#

BigQuery options#

BigQuery remote model example#

BigQuery incremental choices#

BigQuery operator notes#

Snowflake versus BigQuery#

The calibration connection#

Failure modes#

The point#

Related notes#

Sources#