Outcome focus: Defined a BigQuery and dbt constraint playbook that keeps optimizer hints, dbt contracts, data tests, compiled SQL review, and INFORMATION_SCHEMA verification in the right order.
bigquerydbtdata contractsanalytics engineeringquery optimizationdata quality
A team adds primary keys and foreign keys to a BigQuery star schema and expects the warehouse to start behaving like Postgres.
That is the mistake.
The setup is familiar. dim_customers has one row per customer. fct_orders has one row per order. BI users query a wide view that joins the fact table to dimensions because nobody wants every dashboard author to rewrite the same joins. The view is convenient, but some queries only select fact columns and still pay for joins they do not need.
Then somebody reads Google's post on join optimizations with BigQuery primary and foreign keys. They see join elimination and join reordering. They want dbt to declare the keys.
That instinct is good. The dangerous part is the word "constraint."
In BigQuery, primary and foreign keys are metadata that can help the optimizer. They are not an integrity wall. Google's primary and foreign keys guide is explicit that BigQuery does not enforce those key constraints, and that queries over tables with violated constraints can return incorrect results. If the metadata says every customer key is valid but the data disagrees, the optimizer is allowed to trust the metadata.
So the dbt question is not only "can I emit the DDL?"
The better question is: can I publish optimizer metadata only after I have a testable data contract?
The Mental Model#
I treat BigQuery keys as a three-layer contract.
The layers do different jobs.
| Layer | What it gives you | What it does not give you |
|---|---|---|
| dbt contract | Column names, data types, and supported constraints in generated DDL | Runtime uniqueness or referential integrity for unenforced platforms |
| BigQuery key metadata | Join elimination and join reordering opportunities | Protection from bad rows |
| dbt data tests | Actual checks for uniqueness, nulls, and relationships | Automatic query optimization |
INFORMATION_SCHEMA | Proof that the warehouse has the declared metadata | Proof that the data still satisfies the metadata |
This is the tradeoff. BigQuery can move faster because it does not check every insert or table rebuild against relational constraints. You get optimizer signal without transaction-style enforcement. That is useful in analytical systems, but only if the pipeline owns the enforcement elsewhere.
What dbt Can Do Natively#
dbt model contracts were introduced with dbt Core 1.5 as part of a broader model governance release. I would still use a current dbt-core and dbt-bigquery pair, but I would not frame this as a 1.9-only feature.
The current dbt docs say constraints require supported materializations and an enforced model contract. For the BigQuery adapter, that means table and incremental models are the practical targets. Views can participate in limited contract behavior around columns and types, but constraints are not applied to views. Incremental models with contracts also need an intentional on_schema_change policy, usually append_new_columns or fail.
A small native setup looks like this:
version: 2
models:
- name: dim_customers
config:
materialized: table
contract:
enforced: true
columns:
- name: customer_id
data_type: int64
constraints:
- type: not_null
- type: primary_key
warn_unenforced: false
data_tests:
- not_null
- unique
- name: customer_name
data_type: string
- name: fct_orders
config:
materialized: table
contract:
enforced: true
columns:
- name: order_id
data_type: int64
constraints:
- type: not_null
data_tests:
- not_null
- unique
- name: customer_id
data_type: int64
constraints:
- type: foreign_key
expression: "references {{ ref('dim_customers') }}(customer_id)"
warn_unenforced: false
data_tests:
- not_null
- relationships:
arguments:
to: ref('dim_customers')
field: customer_id
- name: order_created_at
data_type: timestampTwo things are doing work here.
First, contract.enforced: true makes dbt check that the model output matches the declared columns and data types before materialization. The dbt model contracts docs say an enforced contract must include every column name and data type.
Second, the data_tests are not decorative. BigQuery can define primary and foreign keys, but it does not enforce them. The unique, not_null, and relationships tests are still the integrity gate. The dbt constraints docs make the same distinction: BigQuery can define primary_key and foreign_key, but those keys are not enforced.
I also do not use a BigQuery unique constraint here. Keep uniqueness as a dbt test. BigQuery key metadata is about PRIMARY KEY and FOREIGN KEY, and dbt's adapter support is platform-specific enough that the compiled SQL deserves inspection.
Inspect the Compiled SQL#
The answer I would not trust is: "Add constraints in YAML and move on."
The first run should include a compile check:
dbt compile --select dim_customers fct_orders
rg -n "primary key|foreign key|not enforced|references" target/run target/compiledYou are looking for BigQuery DDL shaped like this:
create or replace table `analytics_prod`.`marts`.`fct_orders`
(
order_id int64 not null,
customer_id int64 references `analytics_prod`.`marts`.`dim_customers`(customer_id) not enforced,
order_created_at timestamp
)
as (
select order_id, customer_id, order_created_at
from `analytics_prod`.`staging`.`stg_orders`
);or model-level DDL shaped like this:
create or replace table `analytics_prod`.`marts`.`fct_orders`
(
order_id int64 not null,
customer_id int64 not null,
order_created_at timestamp,
foreign key (customer_id)
references `analytics_prod`.`marts`.`dim_customers`(customer_id)
not enforced
)
as (
select order_id, customer_id, order_created_at
from `analytics_prod`.`staging`.`stg_orders`
);I do not care which valid shape the adapter emits as much as I care that the intended metadata reaches BigQuery and points at the right environment. Hard-coded project and dataset names are how development constraints leak into production or production constraints leak into preview.
Use ref() or a macro that resolves relations through dbt. Do not paste a static production table path into a foreign key expression unless you mean to bind every environment to that table.
The Explicit DDL Fallback#
If the compiled SQL does not contain the constraint you expected, or if you want the BigQuery DDL to be painfully obvious, use a post-hook. This is also useful when adapter behavior changes across versions or when you are rolling the pattern into an existing project cautiously.
The straightforward version works for a table that is fully rebuilt:
{{
config(
materialized = "table",
post_hook = [
"ALTER TABLE {{ this }} ADD PRIMARY KEY (customer_id) NOT ENFORCED"
]
)
}}
select
customer_id,
customer_name
from {{ ref("stg_customers") }}The fact table can add a named foreign key:
{{
config(
materialized = "table",
post_hook = [
"ALTER TABLE {{ this }} ADD CONSTRAINT IF NOT EXISTS fk_fct_orders_customer FOREIGN KEY (customer_id) REFERENCES {{ ref('dim_customers') }}(customer_id) NOT ENFORCED"
]
)
}}
select
order_id,
customer_id,
order_created_at
from {{ ref("stg_orders") }}That mirrors BigQuery's GoogleSQL DDL: ALTER TABLE ... ADD PRIMARY KEY (...) NOT ENFORCED and ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY (...) REFERENCES ... NOT ENFORCED.
The hook is not a license to ignore idempotency. A table materialization that recreates the table can safely re-add a primary key after the rebuild. An incremental model may run the same hook against an existing table. A primary key add can fail if the primary key already exists.
For incremental models, make the hook conditional.
{% macro bigquery_add_primary_key_if_missing(column_list) %}
{% set columns = column_list | join(", ") %}
IF NOT EXISTS (
SELECT 1
FROM `{{ this.database }}`.`{{ this.schema }}`.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE table_name = '{{ this.identifier }}'
AND constraint_type = 'PRIMARY KEY'
) THEN
ALTER TABLE {{ this }}
ADD PRIMARY KEY ({{ columns }}) NOT ENFORCED;
END IF;
{% endmacro %}Then call the macro from the model:
{{
config(
materialized = "incremental",
on_schema_change = "fail",
post_hook = "{{ bigquery_add_primary_key_if_missing(['customer_id']) }}"
)
}}
select
customer_id,
customer_name
from {{ ref("stg_customers") }}I would keep foreign keys named and use IF NOT EXISTS where the DDL supports it:
{{
config(
materialized = "incremental",
on_schema_change = "fail",
post_hook = [
"ALTER TABLE {{ this }} ADD CONSTRAINT IF NOT EXISTS fk_fct_orders_customer FOREIGN KEY (customer_id) REFERENCES {{ ref('dim_customers') }}(customer_id) NOT ENFORCED"
]
)
}}
select
order_id,
customer_id,
order_created_at
from {{ ref("stg_orders") }}That is more ceremony than the native contract path. The ceremony buys explicitness and idempotency.
Verify the Warehouse, Not the Intention#
After dbt run, verify BigQuery's catalog.
select
constraint_name,
constraint_type,
table_name,
enforced
from `analytics_prod`.marts.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
where table_name in ('dim_customers', 'fct_orders')
order by table_name, constraint_type;BigQuery's TABLE_CONSTRAINTS view is dataset-scoped and exposes whether a constraint is a primary key or foreign key. Its enforced value is currently NO for these key constraints.
Then verify the constrained columns:
select
constraint_name,
table_name,
column_name,
ordinal_position
from `analytics_prod`.marts.INFORMATION_SCHEMA.KEY_COLUMN_USAGE
where table_name in ('dim_customers', 'fct_orders')
order by table_name, constraint_name, ordinal_position;And verify the referenced side:
select
constraint_name,
unique_constraint_name,
match_option,
update_rule,
delete_rule
from `analytics_prod`.marts.INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
order by constraint_name;I prefer dataset-scoped verification even when I use fully qualified table references in DDL. It keeps the check aligned with the dataset that dbt just built. If you are using multiple datasets for marts, staging, previews, or tenant separation, do not trust memory. Check the catalog that the run actually wrote.
The Tests That Still Matter#
The minimum test set is boring:
models:
- name: dim_customers
columns:
- name: customer_id
data_tests:
- not_null
- unique
- name: fct_orders
columns:
- name: order_id
data_tests:
- not_null
- unique
- name: customer_id
data_tests:
- not_null
- relationships:
arguments:
to: ref('dim_customers')
field: customer_idIf those tests are too expensive to run on the whole table, that is a data engineering problem to solve directly. Use incremental validation windows, sampled early warning checks, audit tables, or a scheduled full validation for the published marts. But do not declare optimizer metadata and skip the proof.
The scary failure is not a failed build.
The scary failure is a successful optimized query over untrue metadata.
Measure Whether the Keys Help#
Keys are not a replacement for partitioning, clustering, pruning, or sane view design. They are one more signal to the optimizer.
I would test a before and after query that matches the claimed benefit:
explain
select
o.order_id,
o.order_created_at
from `analytics_prod`.marts.fct_orders as o
join `analytics_prod`.marts.dim_customers as c
on o.customer_id = c.customer_id;This query only selects columns from the fact table. If the foreign key and primary key metadata are valid, it is a candidate for join elimination. Google's BigQuery post also describes join reordering when elimination is not possible.
The measurement checklist is:
| Check | Question |
|---|---|
| Compiled DDL | Did dbt render PRIMARY KEY, FOREIGN KEY, and NOT ENFORCED where expected? |
| Catalog | Do TABLE_CONSTRAINTS, KEY_COLUMN_USAGE, and REFERENTIAL_CONSTRAINTS show the intended relationship? |
| Integrity | Do dbt unique, not_null, and relationships tests pass for the declared keys? |
| Plan | Did the query plan change in the expected direction? |
| Cost | Did bytes processed, slot time, or shuffle move enough to justify the constraint maintenance? |
| Safety | Can the team detect duplicates or orphaned rows before the optimizer trusts bad metadata? |
If the plan does not improve, remove the constraint or leave it only if the metadata helps your catalog. The existence of a key is not proof that it is doing useful optimizer work.
When I Would Use Each Path#
| Situation | Preferred path |
|---|---|
| New table model with stable columns | dbt contract constraints plus data tests |
| Existing project where adapter output is unclear | Compile inspection, then explicit post-hook if needed |
| Incremental model | Idempotent hook or macro plus on_schema_change: fail |
| Fast-changing staging model | Data tests only; skip warehouse keys until the model is stable |
| Public mart used by BI and downstream jobs | Contract, keys, tests, catalog verification, and plan measurement |
This connects to the broader release-contract point in Dataform + BigQuery Governance Release Patterns. A key is not just a modeling preference once downstream systems depend on it. It is part of the publication contract.
The draft answer was right that dbt can declare these constraints and that post_hook is a useful fallback. The part I would sharpen is the operating boundary: BigQuery keys are optimizer hints with consequences. dbt contracts describe the intended shape. dbt tests prove the data still deserves the metadata. INFORMATION_SCHEMA proves the warehouse received the declaration.
Use all three, or do not publish the keys.