Outcome focus: Reduced contract-break risk in the sanitized release pattern by making schema, freshness, cost, and downstream impact checks part of promotion instead of after-the-fact review.
dataformbigquerydata contractsrelease engineeringgcp
The near miss was small enough to look harmless.
A transformation change renamed a field that only one downstream dashboard was supposed to use. The model built successfully. The changed table looked correct in development. The pull request was easy to approve because the local tests passed.
The problem was the dashboard was not the only consumer. A scheduled export and an ML feature job also depended on the same field. The break would have reached production if promotion had only checked the changed model.
That incident changed how I think about Dataform release patterns on BigQuery. The release unit is not the SQL file. It is the downstream contract.
This case study is sanitized. Table names, ownership names, and incident counts are illustrative, but the release shape is the one I use.
Before state#
The platform had reasonable tools and weak release boundaries.
- Dataform compiled transformations.
- BigQuery stored raw, curated, and reporting tables.
- Analysts reviewed SQL changes.
- Governance reviewed sensitive data handling.
- Downstream teams learned about breaking changes through Slack or broken jobs.
The missing piece was not testing in general. It was testing the contract that other people depended on.
The constraint was delivery speed. The answer could not be "make every change wait for a committee." The release path had to catch dangerous changes without making routine modeling work unbearable.
Release lanes#
I separate Dataform changes into three lanes.
| Lane | Examples | Gate |
|---|---|---|
| Safe internal change | Refactor CTE, add unused column, improve naming inside a private model | Compile, unit assertions, owner approval |
| Contract change | Rename/remove column, change grain, change semantics, alter freshness expectation | Downstream impact, contract approval, migration plan |
| Cost-risk change | New join path, larger scan, incremental strategy change, backfill | Cost estimate, dry-run bytes, rollback plan |
The lane matters because the approval question changes. A contract change is not risky because SQL is hard. It is risky because someone else built a decision on top of the old shape.
The tradeoff#
This pattern adds friction to contract and cost-risk changes.
The gate exists because the friction belongs where the blast radius is real.
The rejected alternative was a blanket approval process for every model change. That protects the platform by making everyone slower, which eventually pushes teams back into side channels. The other rejected alternative was pure autonomy: let every team own its own models and depend on conventions. That keeps velocity high until a silent contract break makes trust expensive.
The middle path is lane-based release discipline.
Safe internal changes stay fast. Contract changes get explicit. Cost-risk changes get measured before production.
Contract artifact#
For governed models, the contract lived beside the model, not in a wiki.
model: orders_daily
owner: revenue-analytics
grain: one row per order date and market
primary_key:
- order_date
- market_id
freshness_sla: "daily by 08:00 ET"
breaking_change_requires:
- owner_approval
- migration_note
- downstream_consumer_notification
columns:
gross_revenue:
type: numeric
nullable: false
semantic: pre-refund gross booked revenue
net_revenue:
type: numeric
nullable: false
semantic: gross revenue minus refunds and creditsThe example is illustrative, but the fields are practical. Grain, keys, nullability, freshness, and semantic meaning are the places where "the dashboard looks wrong" usually starts.
Validation gates#
The release gate checked four things before promotion.
| Gate | Signal | Why it mattered |
|---|---|---|
| Compile | Dataform compile succeeds | Catches syntax and dependency graph issues |
| Contract | Grain, keys, required columns, type expectations | Catches breaking downstream changes |
| Cost | BigQuery dry-run bytes or changed scan pattern | Catches expensive changes before they surprise finance |
| Smoke | Row count, freshness, null spike, downstream job health | Catches regressions immediately after promotion |
The smoke check was intentionally boring. Row counts, freshness, and null spikes catch more real incidents than elegant checks that nobody maintains.
Rollback behavior#
Rollback had to be boring too.
For governed models, the rollback plan was one of:
- revert the Dataform release and republish,
- restore a table snapshot,
- disable a downstream schedule,
- publish a compatibility column while consumers migrate,
- freeze a backfill until a cost-risk change is reviewed.
The important decision is made before release: which rollback path is available for this model?
If the answer is "we will figure it out when it breaks," the model is not ready for governed promotion.
Measured outcome#
The most visible outcome was fewer surprise breaks reaching consumers. In a sanitized pattern, the useful measures were:
- contract-break incidents per month,
- percentage of governed models with explicit owners,
- promotion lead time by lane,
- BigQuery dry-run bytes for cost-risk changes,
- recovery time after a failed smoke check.
The point was not to make every number perfect. The point was to make release risk observable.
One useful target was: safe internal changes should stay same-day, while contract changes should include a migration plan before production. If those two lanes take the same amount of time, the process is probably too heavy for one of them and too light for the other.
What I would change next time#
I would make downstream registration explicit earlier.
The first version inferred consumers from known dashboards, scheduled queries, and feature jobs. That was better than guessing, but it still missed informal dependencies. A simple consumers.yaml file per governed model would have made the ownership conversation clearer:
model: orders_daily
consumers:
- name: executive-revenue-dashboard
owner: revenue-analytics
dependency: dashboard
- name: churn-feature-batch
owner: ml-platform
dependency: feature_table
- name: finance-close-export
owner: finance-ops
dependency: scheduled_exportData contracts work when they are treated as release artifacts. Dataform and BigQuery are strong enough to support that pattern, but the tools do not create the discipline on their own. The discipline is deciding which changes are safe, which changes are contracts, and which changes need a rollback plan before they touch production.