Compliant GCP Platform Playbook for Analytics and ML

A sanitized GCP platform case study where compliance, analytics delivery, and ML feature access had to be designed as one release path instead of three disconnected workstreams.

By Jovani Pink January 12, 2026 5 min — Platform & AI Engineering

Outcome focus: Reduced governed dataset onboarding from weeks to days in the sanitized pattern while preserving auditability, cost visibility, and promotion rules for analytics and ML use cases.

The first platform design looked clean until the compliance review.

Analytics wanted faster access to governed BigQuery datasets. ML wanted stable feature tables with known lineage. Governance wanted policy tags, approval evidence, and auditability. Platform wanted a small number of reusable patterns instead of a custom exception for every team.

The trap was sequencing those needs. Build the platform first, add controls later. Let analysts move fast, then make it compliant. Give ML a sandbox, then figure out promotion. Every version of that plan created the same risk: the system would work by bypassing the rules it was supposed to enforce.

The better design treated compliance as a platform feature.

This is a sanitized case study. The dataset names, volumes, and timings are rounded or illustrative, but the architecture tradeoffs are the ones that matter.

Before state#

The starting point had three lanes, but only one of them was explicit.

  1. Governed reporting. Slow, reviewed, and trusted once it shipped.
  2. Ad hoc analytics. Fast, useful, and often hard to trace later.
  3. ML experimentation. Productive in notebooks, brittle at promotion time.

Nobody intended to create shadow paths. They emerged because the official path was expensive.

The symptoms were familiar:

  • Sensitive columns were documented, but policy enforcement depended on humans remembering the rule.
  • Dataset promotion meant opening tickets and attaching screenshots.
  • Feature tables were copied from analytics models without a clear owner for drift, freshness, or schema change.
  • Cost reviews happened after a query pattern was already expensive.
  • Access reviews were periodic, not tied to the moment a data product changed.

The constraint was not "be compliant." That is too broad to design against.

The actual constraint was sharper: make the compliant path faster than the workaround.

Design decision#

The platform used two lanes with shared controls:

LanePurposePromotion rule
ExperimentalExploration, notebook work, prototype featuresTime-limited access, no executive reporting, no production ML dependency
GovernedReporting, published marts, ML feature tablesTests, policy tags, owner, release approval, cost guardrail

The important part is that both lanes used the same primitives: BigQuery datasets, service accounts, IAM groups, Dataform release workflows, policy tags, and platform telemetry. The difference was not a different tool. The difference was the promotion contract.

The platform used one promotion path for reporting and ML instead of separate compliance exceptions.

The tradeoff#

The tradeoff was accepting stricter promotion in exchange for faster approved reuse.

The loose alternative was to let teams copy data into their own datasets and ask governance to review later. That is faster for the first request and slower for every request after it. It multiplies policy surfaces, duplicates definitions, and makes access review harder.

The strict alternative was to force every exploratory question through the governed lane. That protects the platform, but it destroys learning speed.

The two-lane design gave both sides a place:

  • experiments could happen without pretending to be production,
  • production assets had explicit ownership and tests,
  • ML feature tables did not get special exemptions,
  • access reviews attached to data products instead of one-off tickets.

Operating artifact: promotion checklist#

The promotion checklist was deliberately short enough to use in a pull request.

CheckRequired evidence
OwnerNamed data product owner and Slack/escalation route
ClassificationPolicy tags on sensitive columns and documented data class
ContractGrain, primary keys, null expectations, and accepted value ranges
TestsDataform assertions for breaking schema and quality changes
AccessIAM group or service account reviewed for least privilege
CostExpected query pattern and monthly cost risk noted
FreshnessSLA and stale-data behavior documented
Downstream useReporting, ML, or operational consumers listed
RollbackLast known good release or disable path available

This checklist did not replace governance. It made governance executable.

Measured outcome#

The sanitized target was not "move faster" in the abstract. It was:

  • reduce governed dataset onboarding from multiple weeks to a few business days for standard patterns,
  • remove recurring manual review for low-risk schema changes that passed policy and test gates,
  • make feature-table promotion use the same path as reporting marts,
  • expose query cost by data product owner,
  • give auditors evidence from the release path instead of reconstructing it from tickets.

The most useful metric was time from "dataset is ready for promotion" to "approved governed asset." That number dropped because the approval conversation moved from "please trust this dataset" to "here is the contract, tests, owner, policy tags, cost expectation, and downstream consumer list."

Not every outcome could be measured cleanly. The risk reduction from fewer shadow datasets is partly inferred. But the operational signals were visible: fewer unclear access tickets, fewer promotion surprises, faster standard approvals, and better cost attribution.

What I would change next time#

I would add cost guardrails earlier.

Security and access usually get the first governance attention, but runaway query cost is also a governance problem. A table that is technically compliant but financially invisible can still damage trust in the platform.

The next version would make these fields part of the first promotion request:

FieldExample
Expected consumerWeekly executive dashboard, daily feature pipeline
Query patternScheduled aggregate, analyst exploration, batch scoring
Cost ownerPlatform, product analytics, ML team
Alert thresholdJob cost or scanned bytes above agreed limit
RemediationPartitioning, materialized table, semantic-layer change, or access review

Compliance platforms work when the safe path is also the easy path. On GCP, that means using BigQuery, IAM, policy tags, Dataform, and telemetry as one operating system, not as disconnected controls.

Back to all writing
On this page
  1. Before state
  2. Design decision
  3. The tradeoff
  4. Operating artifact: promotion checklist
  5. Measured outcome
  6. What I would change next time