BigQuery Keys in dbt Are Optimizer Hints, Not Enforcement

Outcome focus: Defined a BigQuery and dbt constraint playbook that keeps optimizer hints, dbt contracts, data tests, compiled SQL review, and INFORMATION_SCHEMA verification in the right order.

A team adds primary keys and foreign keys to a BigQuery star schema and expects the warehouse to start behaving like Postgres.

That is the mistake.

The setup is familiar. dim_customers has one row per customer. fct_orders has one row per order. BI users query a wide view that joins the fact table to dimensions because nobody wants every dashboard author to rewrite the same joins. The view is convenient, but some queries only select fact columns and still pay for joins they do not need.

Then somebody reads Google's post on join optimizations with BigQuery primary and foreign keys. They see join elimination and join reordering. They want dbt to declare the keys.

That instinct is good. The dangerous part is the word "constraint."

In BigQuery, primary and foreign keys are metadata that can help the optimizer. They are not an integrity wall. Google's primary and foreign keys guide is explicit that BigQuery does not enforce those key constraints, and that queries over tables with violated constraints can return incorrect results. If the metadata says every customer key is valid but the data disagrees, the optimizer is allowed to trust the metadata.

So the dbt question is not only "can I emit the DDL?"

The better question is: can I publish optimizer metadata only after I have a testable data contract?

The Mental Model#

I treat BigQuery keys as a three-layer contract.

dbt describes the model, BigQuery stores optimizer metadata, and tests remain the enforcement layer.

The layers do different jobs.

Layer	What it gives you	What it does not give you
dbt contract	Column names, data types, and supported constraints in generated DDL	Runtime uniqueness or referential integrity for unenforced platforms
BigQuery key metadata	Join elimination and join reordering opportunities	Protection from bad rows
dbt data tests	Actual checks for uniqueness, nulls, and relationships	Automatic query optimization
`INFORMATION_SCHEMA`	Proof that the warehouse has the declared metadata	Proof that the data still satisfies the metadata

This is the tradeoff. BigQuery can move faster because it does not check every insert or table rebuild against relational constraints. You get optimizer signal without transaction-style enforcement. That is useful in analytical systems, but only if the pipeline owns the enforcement elsewhere.

What dbt Can Do Natively#

dbt model contracts were introduced with dbt Core 1.5 as part of a broader model governance release. I would still use a current dbt-core and dbt-bigquery pair, but I would not frame this as a 1.9-only feature.

The current dbt docs say constraints require supported materializations and an enforced model contract. For the BigQuery adapter, that means table and incremental models are the practical targets. Views can participate in limited contract behavior around columns and types, but constraints are not applied to views. Incremental models with contracts also need an intentional on_schema_change policy, usually append_new_columns or fail.

A small native setup looks like this:

models/marts/orders.yml

version: 2
 
models:
  - name: dim_customers
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: int64
        constraints:
          - type: not_null
          - type: primary_key
            warn_unenforced: false
        data_tests:
          - not_null
          - unique
      - name: customer_name
        data_type: string
 
  - name: fct_orders
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: order_id
        data_type: int64
        constraints:
          - type: not_null
        data_tests:
          - not_null
          - unique
      - name: customer_id
        data_type: int64
        constraints:
          - type: foreign_key
            expression: "references {{ ref('dim_customers') }}(customer_id)"
            warn_unenforced: false
        data_tests:
          - not_null
          - relationships:
              arguments:
                to: ref('dim_customers')
                field: customer_id
      - name: order_created_at
        data_type: timestamp

Two things are doing work here.

First, contract.enforced: true makes dbt check that the model output matches the declared columns and data types before materialization. The dbt model contracts docs say an enforced contract must include every column name and data type.

Second, the data_tests are not decorative. BigQuery can define primary and foreign keys, but it does not enforce them. The unique, not_null, and relationships tests are still the integrity gate. The dbt constraints docs make the same distinction: BigQuery can define primary_key and foreign_key, but those keys are not enforced.

I also do not use a BigQuery unique constraint here. Keep uniqueness as a dbt test. BigQuery key metadata is about PRIMARY KEY and FOREIGN KEY, and dbt's adapter support is platform-specific enough that the compiled SQL deserves inspection.

Inspect the Compiled SQL#

The answer I would not trust is: "Add constraints in YAML and move on."

The first run should include a compile check:

dbt compile --select dim_customers fct_orders
rg -n "primary key|foreign key|not enforced|references" target/run target/compiled

You are looking for BigQuery DDL shaped like this:

expected-shape.sql

create or replace table `analytics_prod`.`marts`.`fct_orders`
(
  order_id int64 not null,
  customer_id int64 references `analytics_prod`.`marts`.`dim_customers`(customer_id) not enforced,
  order_created_at timestamp
)
as (
  select order_id, customer_id, order_created_at
  from `analytics_prod`.`staging`.`stg_orders`
);

or model-level DDL shaped like this:

expected-model-level-shape.sql

create or replace table `analytics_prod`.`marts`.`fct_orders`
(
  order_id int64 not null,
  customer_id int64 not null,
  order_created_at timestamp,
  foreign key (customer_id)
    references `analytics_prod`.`marts`.`dim_customers`(customer_id)
    not enforced
)
as (
  select order_id, customer_id, order_created_at
  from `analytics_prod`.`staging`.`stg_orders`
);

I do not care which valid shape the adapter emits as much as I care that the intended metadata reaches BigQuery and points at the right environment. Hard-coded project and dataset names are how development constraints leak into production or production constraints leak into preview.

Use ref() or a macro that resolves relations through dbt. Do not paste a static production table path into a foreign key expression unless you mean to bind every environment to that table.

The Explicit DDL Fallback#

If the compiled SQL does not contain the constraint you expected, or if you want the BigQuery DDL to be painfully obvious, use a post-hook. This is also useful when adapter behavior changes across versions or when you are rolling the pattern into an existing project cautiously.

The straightforward version works for a table that is fully rebuilt:

models/marts/dim_customers.sql

{{
  config(
    materialized = "table",
    post_hook = [
      "ALTER TABLE {{ this }} ADD PRIMARY KEY (customer_id) NOT ENFORCED"
    ]
  )
}}
 
select
  customer_id,
  customer_name
from {{ ref("stg_customers") }}

The fact table can add a named foreign key:

models/marts/fct_orders.sql

{{
  config(
    materialized = "table",
    post_hook = [
      "ALTER TABLE {{ this }} ADD CONSTRAINT IF NOT EXISTS fk_fct_orders_customer FOREIGN KEY (customer_id) REFERENCES {{ ref('dim_customers') }}(customer_id) NOT ENFORCED"
    ]
  )
}}
 
select
  order_id,
  customer_id,
  order_created_at
from {{ ref("stg_orders") }}

That mirrors BigQuery's GoogleSQL DDL: ALTER TABLE ... ADD PRIMARY KEY (...) NOT ENFORCED and ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY (...) REFERENCES ... NOT ENFORCED.

The hook is not a license to ignore idempotency. A table materialization that recreates the table can safely re-add a primary key after the rebuild. An incremental model may run the same hook against an existing table. A primary key add can fail if the primary key already exists.

For incremental models, make the hook conditional.

macros/bigquery_add_primary_key_if_missing.sql

{% macro bigquery_add_primary_key_if_missing(column_list) %}
  {% set columns = column_list | join(", ") %}
  IF NOT EXISTS (
    SELECT 1
    FROM `{{ this.database }}`.`{{ this.schema }}`.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
    WHERE table_name = '{{ this.identifier }}'
      AND constraint_type = 'PRIMARY KEY'
  ) THEN
    ALTER TABLE {{ this }}
    ADD PRIMARY KEY ({{ columns }}) NOT ENFORCED;
  END IF;
{% endmacro %}

Then call the macro from the model:

models/marts/dim_customers.sql

{{
  config(
    materialized = "incremental",
    on_schema_change = "fail",
    post_hook = "{{ bigquery_add_primary_key_if_missing(['customer_id']) }}"
  )
}}
 
select
  customer_id,
  customer_name
from {{ ref("stg_customers") }}

I would keep foreign keys named and use IF NOT EXISTS where the DDL supports it:

models/marts/fct_orders.sql

{{
  config(
    materialized = "incremental",
    on_schema_change = "fail",
    post_hook = [
      "ALTER TABLE {{ this }} ADD CONSTRAINT IF NOT EXISTS fk_fct_orders_customer FOREIGN KEY (customer_id) REFERENCES {{ ref('dim_customers') }}(customer_id) NOT ENFORCED"
    ]
  )
}}
 
select
  order_id,
  customer_id,
  order_created_at
from {{ ref("stg_orders") }}

That is more ceremony than the native contract path. The ceremony buys explicitness and idempotency.

Verify the Warehouse, Not the Intention#

After dbt run, verify BigQuery's catalog.

verify_table_constraints.sql

select
  constraint_name,
  constraint_type,
  table_name,
  enforced
from `analytics_prod`.marts.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
where table_name in ('dim_customers', 'fct_orders')
order by table_name, constraint_type;

BigQuery's TABLE_CONSTRAINTS view is dataset-scoped and exposes whether a constraint is a primary key or foreign key. Its enforced value is currently NO for these key constraints.

Then verify the constrained columns:

verify_key_columns.sql

select
  constraint_name,
  table_name,
  column_name,
  ordinal_position
from `analytics_prod`.marts.INFORMATION_SCHEMA.KEY_COLUMN_USAGE
where table_name in ('dim_customers', 'fct_orders')
order by table_name, constraint_name, ordinal_position;

And verify the referenced side:

verify_references.sql

select
  constraint_name,
  unique_constraint_name,
  match_option,
  update_rule,
  delete_rule
from `analytics_prod`.marts.INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
order by constraint_name;

I prefer dataset-scoped verification even when I use fully qualified table references in DDL. It keeps the check aligned with the dataset that dbt just built. If you are using multiple datasets for marts, staging, previews, or tenant separation, do not trust memory. Check the catalog that the run actually wrote.

The Tests That Still Matter#

The minimum test set is boring:

models/marts/orders.yml

models:
  - name: dim_customers
    columns:
      - name: customer_id
        data_tests:
          - not_null
          - unique
 
  - name: fct_orders
    columns:
      - name: order_id
        data_tests:
          - not_null
          - unique
      - name: customer_id
        data_tests:
          - not_null
          - relationships:
              arguments:
                to: ref('dim_customers')
                field: customer_id

If those tests are too expensive to run on the whole table, that is a data engineering problem to solve directly. Use incremental validation windows, sampled early warning checks, audit tables, or a scheduled full validation for the published marts. But do not declare optimizer metadata and skip the proof.

The scary failure is not a failed build.

The scary failure is a successful optimized query over untrue metadata.

Measure Whether the Keys Help#

Keys are not a replacement for partitioning, clustering, pruning, or sane view design. They are one more signal to the optimizer.

I would test a before and after query that matches the claimed benefit:

join-elimination-candidate.sql

explain
select
  o.order_id,
  o.order_created_at
from `analytics_prod`.marts.fct_orders as o
join `analytics_prod`.marts.dim_customers as c
  on o.customer_id = c.customer_id;

This query only selects columns from the fact table. If the foreign key and primary key metadata are valid, it is a candidate for join elimination. Google's BigQuery post also describes join reordering when elimination is not possible.

The measurement checklist is:

Check	Question
Compiled DDL	Did dbt render `PRIMARY KEY`, `FOREIGN KEY`, and `NOT ENFORCED` where expected?
Catalog	Do `TABLE_CONSTRAINTS`, `KEY_COLUMN_USAGE`, and `REFERENTIAL_CONSTRAINTS` show the intended relationship?
Integrity	Do dbt `unique`, `not_null`, and `relationships` tests pass for the declared keys?
Plan	Did the query plan change in the expected direction?
Cost	Did bytes processed, slot time, or shuffle move enough to justify the constraint maintenance?
Safety	Can the team detect duplicates or orphaned rows before the optimizer trusts bad metadata?

If the plan does not improve, remove the constraint or leave it only if the metadata helps your catalog. The existence of a key is not proof that it is doing useful optimizer work.

When I Would Use Each Path#

Situation	Preferred path
New table model with stable columns	dbt contract constraints plus data tests
Existing project where adapter output is unclear	Compile inspection, then explicit post-hook if needed
Incremental model	Idempotent hook or macro plus `on_schema_change: fail`
Fast-changing staging model	Data tests only; skip warehouse keys until the model is stable
Public mart used by BI and downstream jobs	Contract, keys, tests, catalog verification, and plan measurement

This connects to the broader release-contract point in Dataform + BigQuery Governance Release Patterns. A key is not just a modeling preference once downstream systems depend on it. It is part of the publication contract.

The draft answer was right that dbt can declare these constraints and that post_hook is a useful fallback. The part I would sharpen is the operating boundary: BigQuery keys are optimizer hints with consequences. dbt contracts describe the intended shape. dbt tests prove the data still deserves the metadata. INFORMATION_SCHEMA proves the warehouse received the declaration.

Use all three, or do not publish the keys.