Outcome focus: Compared Snowflake and BigQuery patterns for scheduled in-warehouse inference, corrected CDC assumptions, and defined monitoring, grants, and deployment checks.
snowflakebigquerymlopsinferencedata engineering
Not every model needs a real-time API.
Sometimes the cleanest deployment is a scored table.
The data is already in the warehouse. The downstream system already reads from the warehouse. The decision happens daily, hourly, or every few minutes. The model is small enough, stable enough, and batch-friendly enough that moving data out to an online service only adds cost and failure modes.
That is where in-warehouse inference is useful.
Score close to the data.
Write results back to a table.
Let the rest of the business consume the score through the warehouse, CRM sync, reverse ETL, dashboard, or operational workflow.
This post covers two practical patterns:
- Snowflake scoring with Streams, Tasks, and Python UDFs.
- BigQuery scoring with scheduled queries, change history or watermarks, and BigQuery ML or Vertex AI remote models.
The goal is not to pretend warehouses replace model serving platforms.
The goal is to use the right boundary.
When in-warehouse inference fits#
Use warehouse inference when:
- scoring can run in batch or micro-batch
- the source data is already in the warehouse
- the model input is tabular or SQL-friendly
- latency requirements are minutes or hours, not milliseconds
- downstream consumers want tables
- governance is easier inside the warehouse boundary
- the model artifact is small enough or hosted behind a warehouse-accessible prediction interface
Avoid it when:
- the decision needs low-latency online inference
- the model is too large or slow for UDF execution
- request-time context lives outside the warehouse
- the feature path differs from the warehouse data path
- the model requires GPUs or custom runtime behavior
- per-row calls would create unacceptable cost or latency
The warehouse pattern is an operating choice, not a badge of maturity.
The corrected Snowflake shape#
The draft pattern people often write is:
Create a stream on a table, then query metadata columns from the base table.
That is wrong.
Snowflake stream metadata columns such as METADATA$ACTION, METADATA$ISUPDATE, and METADATA$ROW_ID are returned when you query the stream object, not the base table.
Snowflake documents that a stream returns the same columns as the source object plus these metadata columns:
METADATA$ACTION, withINSERTorDELETEMETADATA$ISUPDATE, indicating whether the row is part of an update pairMETADATA$ROW_ID, a unique immutable row id for tracking changes
Updates are represented as a DELETE and INSERT pair with METADATA$ISUPDATE = TRUE.
For scoring current rows, you usually want the INSERT side. That includes new inserts and the after-image of updates.
Snowflake example#
Assume a feature table:
CREATE OR REPLACE TABLE RAW_FEATURES (
ID STRING,
AGE FLOAT,
RX_COST_30D FLOAT,
VISITS_90D FLOAT,
COPAY_VOLATILITY FLOAT,
UPDATED_AT TIMESTAMP_NTZ
);Create a scored table:
CREATE OR REPLACE TABLE SCORED_CUSTOMERS (
ID STRING,
PREDICTED_PROBABILITY FLOAT,
SCORED_AT TIMESTAMP_NTZ,
MODEL_VERSION STRING
);Create the stream:
CREATE OR REPLACE STREAM RAW_FEATURES_STREAM
ON TABLE RAW_FEATURES;Create the Python UDF:
CREATE OR REPLACE FUNCTION PREDICT_PROBA(FEATURES ARRAY)
RETURNS FLOAT
LANGUAGE PYTHON
RUNTIME_VERSION = 3.12
PACKAGES = ('scikit-learn', 'joblib', 'numpy')
IMPORTS = ('@MODEL_STAGE/model.joblib')
HANDLER = 'handler'
AS $$
import sys
import joblib
import numpy as np
import_dir = sys._xoptions["snowflake_import_directory"]
model = joblib.load(open(import_dir + "/model.joblib", "rb"))
def handler(features):
x = np.array([features], dtype=float)
return float(model.predict_proba(x)[0, 1])
$$;Then create a task that consumes the stream:
CREATE OR REPLACE TASK SCORE_CHANGED_CUSTOMERS
WAREHOUSE = ML_WH
WHEN SYSTEM$STREAM_HAS_DATA('RAW_FEATURES_STREAM')
AS
MERGE INTO SCORED_CUSTOMERS AS target
USING (
SELECT
ID,
ARRAY_CONSTRUCT(
AGE,
RX_COST_30D,
VISITS_90D,
COPAY_VOLATILITY
) AS FEATURES
FROM RAW_FEATURES_STREAM
WHERE METADATA$ACTION = 'INSERT'
QUALIFY ROW_NUMBER() OVER (
PARTITION BY ID
ORDER BY UPDATED_AT DESC
) = 1
) AS source
ON target.ID = source.ID
WHEN MATCHED THEN UPDATE SET
target.PREDICTED_PROBABILITY = PREDICT_PROBA(source.FEATURES),
target.SCORED_AT = CURRENT_TIMESTAMP(),
target.MODEL_VERSION = 'risk_model_2026_04_27'
WHEN NOT MATCHED THEN INSERT (
ID,
PREDICTED_PROBABILITY,
SCORED_AT,
MODEL_VERSION
) VALUES (
source.ID,
PREDICT_PROBA(source.FEATURES),
CURRENT_TIMESTAMP(),
'risk_model_2026_04_27'
);New Snowflake tasks start suspended, so resume the task:
ALTER TASK SCORE_CHANGED_CUSTOMERS RESUME;The important correction is the source:
FROM RAW_FEATURES_STREAMnot:
FROM RAW_FEATURESThat is where the CDC metadata lives.
Snowflake operator notes#
Stage the model artifact:
PUT file://model.joblib @MODEL_STAGE AUTO_COMPRESS = FALSE;Grant:
- usage on warehouse
- usage on database and schema
- read on stage
- usage on function
- select on source table or stream as appropriate
- ownership or operate privileges for task administration
Monitor:
- task history
- stream consumption
- rows scored
- UDF errors
- warehouse credit usage
- score distribution
- model version distribution
- calibration drift when labels arrive
Snowflake Python UDFs are convenient, but convenience is not free. Per-row UDF execution can become expensive or slow if the model is heavy or the row volume is large. For high-volume scoring, consider vectorized UDFs, Snowpark batch jobs, or scoring outside the warehouse and writing results back.
BigQuery options#
BigQuery has a different shape.
The main options are:
- BigQuery ML models scored with
ML.PREDICT. - Imported models supported by BigQuery ML, such as TensorFlow, ONNX, TensorFlow Lite, and XGBoost.
- Remote models backed by Vertex AI endpoints, called through
ML.PREDICT. - Scheduled queries for recurring scoring.
- Change history functions for incremental processing, currently Preview.
- A conservative watermark pattern when Preview change history is not appropriate.
The cleanest production pattern depends on where the model lives.
If the model can live in BigQuery ML, use ML.PREDICT directly.
If the model is a custom Python or large model hosted in Vertex AI, create a BigQuery remote model over the Vertex AI endpoint and call ML.PREDICT.
BigQuery remote model example#
BigQuery supports remote models over custom Vertex AI models using CREATE MODEL ... REMOTE WITH CONNECTION.
The docs emphasize that remote models are useful when a model is too large to import into BigQuery or when you want one inference endpoint for online, batch, and micro-batch use cases.
Example:
CREATE OR REPLACE MODEL `PROJECT_ID.ml.risk_remote_model`
INPUT (
age FLOAT64,
rx_cost_30d FLOAT64,
visits_90d FLOAT64,
copay_volatility FLOAT64
)
OUTPUT (
predicted_probability FLOAT64
)
REMOTE WITH CONNECTION `PROJECT_ID.us.vertex_connection`
OPTIONS (
ENDPOINT = 'https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID'
);Score a table:
SELECT
id,
predicted_probability,
CURRENT_TIMESTAMP() AS scored_at,
'risk_model_2026_04_27' AS model_version
FROM ML.PREDICT(
MODEL `PROJECT_ID.ml.risk_remote_model`,
(
SELECT
id,
age,
rx_cost_30d,
visits_90d,
copay_volatility
FROM `PROJECT_ID.features.raw_features`
)
);Then merge results:
MERGE `PROJECT_ID.ml.scored_customers` AS target
USING (
SELECT
id,
predicted_probability,
CURRENT_TIMESTAMP() AS scored_at,
'risk_model_2026_04_27' AS model_version
FROM ML.PREDICT(
MODEL `PROJECT_ID.ml.risk_remote_model`,
(
SELECT
id,
age,
rx_cost_30d,
visits_90d,
copay_volatility
FROM `PROJECT_ID.features.raw_features`
WHERE updated_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
)
)
) AS source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET
predicted_probability = source.predicted_probability,
scored_at = source.scored_at,
model_version = source.model_version
WHEN NOT MATCHED THEN INSERT (
id,
predicted_probability,
scored_at,
model_version
) VALUES (
source.id,
source.predicted_probability,
source.scored_at,
source.model_version
);Schedule this as a BigQuery scheduled query.
BigQuery scheduled queries are written in GoogleSQL and can include DDL and DML. The schedule time is converted to UTC, which matters when stakeholders expect local-time behavior.
BigQuery incremental choices#
There are two common incremental patterns.
The conservative pattern is a watermark:
WHERE updated_at > (
SELECT COALESCE(MAX(scored_at), TIMESTAMP '1970-01-01')
FROM `PROJECT_ID.ml.scored_customers`
)This is simple, but it assumes updated_at is reliable and that late-arriving records are handled intentionally. In real systems, use a lookback window:
WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)Then MERGE makes the operation idempotent.
The newer option is BigQuery change history.
BigQuery change history provides APPENDS and CHANGES functions. APPENDS returns rows appended during a time range. CHANGES returns rows changed during a time range, but the table must have enable_change_history set to TRUE. The feature is currently Preview, which means it belongs behind a deliberate platform decision, not casual copy-paste.
Use change history when:
- the account accepts Preview feature risk
- storage and compute costs are understood
- changes include updates and deletes, not only appends
- IAM and row or column access policies are compatible
- the scoring workflow needs true changed-row processing
Use watermarks when:
- the source table has reliable update timestamps
- append-heavy behavior dominates
- operational simplicity matters more than full CDC semantics
BigQuery operator notes#
For remote models:
- create a BigQuery connection
- grant the connection service account access to Vertex AI
- match remote model input and output fields to the endpoint contract
- keep region alignment in mind
- monitor endpoint latency and BigQuery job cost
For scheduled queries:
- enable BigQuery Data Transfer Service requirements
- use a service account with least privilege
- write idempotent
MERGEstatements - parameterize windows carefully
- alert on failures
For scoring:
- log model version
- log scored timestamp
- preserve input snapshot or feature hash
- monitor row deltas
- monitor score distribution
- recompute Brier and log loss when labels arrive
Snowflake versus BigQuery#
Snowflake Streams and Tasks feel like a native CDC pipeline:
- stream captures table changes
- task triggers on stream data or schedule
- Python UDF or Snowpark code scores
- MERGE writes output
BigQuery feels more query orchestration oriented:
- scheduled query runs SQL
ML.PREDICTscores BigQuery ML, imported, or remote Vertex AI models- change history can provide incremental deltas in Preview
- watermark patterns remain practical and stable
Neither is universally better.
Snowflake is strong when stream-driven tasks and in-warehouse Python are the natural fit.
BigQuery is strong when BigQuery ML, scheduled SQL, and Vertex AI remote models match the architecture.
The decision should follow:
- where the data lives
- where the model lives
- latency requirement
- volume
- governance boundary
- cost model
- operational ownership
The calibration connection#
In-warehouse inference does not remove the need for model controls.
If the score is a probability, monitor calibration.
At minimum:
- score distribution by run
- positive rate by probability bucket when labels arrive
- Brier score
- log loss
- thresholded decision rate
- segment calibration
The post Probability Calibration Is an Operating Control covers this side of the work. The warehouse runbook should carry that monitoring into production.
The warehouse can make scoring easier.
It does not make the score trustworthy by itself.
Failure modes#
The first Snowflake failure mode is querying the base table and expecting stream metadata columns.
Query the stream.
The second is treating updates as simple rows without understanding INSERT and DELETE pairs.
For current-state scoring, consume the INSERT side.
The third is using per-row Python UDFs for a model that should be batch-scored elsewhere.
Convenience can become a credit bill.
The first BigQuery failure mode is assuming CDC is identical to Snowflake streams.
It is not. BigQuery change history exists, but it is Preview. Watermarks are often the practical default.
The second is creating a remote model without respecting endpoint schema, region, and permissions.
The third is scheduling a query that is not idempotent.
If it runs twice, the output should still be correct.
The point#
In-warehouse inference is not a shortcut around MLOps.
It is a deployment pattern.
Use it when the decision is batch-friendly, the data already lives in the warehouse, and the score can be written back as a governed table.
Snowflake gives you Streams and Tasks.
BigQuery gives you Scheduled Queries, BigQuery ML, remote Vertex AI models, and change history where appropriate.
The engineering standard is the same in both:
Score only what changed when possible.
Make the scoring job idempotent.
Log model version and scored time.
Monitor row counts, failures, cost, and calibration.
Keep the score close to the decision it supports.
Related notes#
- Probability Calibration Is an Operating Control
- A scikit-learn Pipeline for Calibrated Decisions
- AI Strategy Starts Before the Model
- What a Data Strategist Actually Does
- The Preprocessing Boundary Between scikit-learn and PyTorch
Sources#
- Snowflake, CREATE STREAM
- Snowflake, Introduction to Streams
- Snowflake, CREATE TASK
- Snowflake, Introduction to Tasks
- Snowflake, Using third-party packages in Python UDFs
- BigQuery, ML.PREDICT
- BigQuery, Remote models over custom Vertex AI models
- BigQuery, Make predictions with remote models on Vertex AI
- BigQuery, Work with change history
- BigQuery, Scheduled queries