AI Strategy Starts Before the Model

A practical AI strategy framework with a worked example that connects business levers, data readiness, pilots, evaluation, governance, deployment, and operating metrics.

By Jovani Pink November 4, 2025 15 min — Platform & AI Engineering

Outcome focus: Defined an end-to-end AI strategy playbook and worked example that ties data readiness, use-case selection, model development, governance, deployment, and operating ownership to measurable business outcomes.

The worst AI strategy starts with a model.

Not because models are unimportant. They matter. But the model is rarely the first constraint. The first constraint is usually that the business has not decided which decision should change, what metric proves the change mattered, what data is trusted enough to support the decision, and who will own the workflow after the demo ends.

That is where most AI efforts get blurry.

The company says it wants AI. A few use cases appear. Someone builds a prototype. The demo looks impressive. Then the work slows down. The data is incomplete. The workflow is unclear. Legal has questions. The model cannot be monitored. The users do not trust the output. The metric never moves. Six months later, people say the organization is "not ready for AI," which is sometimes true, but not precise enough to be useful.

The better approach is to treat AI like a serious transformation effort.

Start from business levers.

Prove value small.

Industrialize only what works.

Govern the risk before it becomes an incident.

That is the whole strategy in plain language.

Start with business levers, not use cases#

"Use case" is one of those phrases that sounds practical while hiding a lot of work.

A use case is not enough.

The better first question is:

If this works, what changes in the business?

Revenue might change through conversion, cross-sell, retention, pricing, basket size, win rate, or customer lifetime value.

Cost might change through automation, fewer manual reviews, lower rework, faster handling time, reduced waste, improved inventory decisions, or fewer escalations.

Experience might change through NPS, complaint volume, response time, personalization, availability, or quality of service.

Risk might change through fewer compliance breaches, earlier fraud detection, safer workflows, better auditability, or reduced operational exposure.

An AI idea that does not touch one of those levers is not ready.

It might be research. It might be learning. It might be a useful technical exploration. But it is not yet a business strategy.

Define success in numbers before the model exists:

  • Reduce average handle time by 10 percent.
  • Increase conversion in a defined segment by 2 points.
  • Cut manual review volume by 30 percent without increasing error rate.
  • Improve first-contact resolution by 5 points.
  • Reduce high-risk false negatives by 20 percent.
  • Automate a workflow step with human review and a measured override rate.

The number can be wrong at first.

The absence of a number is worse.

The decision is the unit of strategy#

AI work becomes clearer when the team stops asking "what can AI do here" and starts asking "what decision are we improving."

Call these customers this week, not those.

Route this case to a specialist, not the general queue.

Approve this low-risk claim automatically, but send that one to review.

Suggest these three knowledge articles to the support agent.

Flag this data pipeline run because the distribution changed.

Summarize this contract for review, but do not approve it automatically.

That is the level where AI becomes operational.

Models generate scores, classifications, text, recommendations, embeddings, summaries, and tool calls. Businesses change through decisions and workflows.

If nobody can name the decision, the AI work will drift toward demo logic.

This is the same argument I made in What a Data Strategist Actually Does: data strategy is not about producing more artifacts. It is about improving decisions.

AI strategy is the same discipline with higher stakes.

Data readiness is not optional#

Before modeling, ask the brutal questions.

Do we have the data?

Can we legally use it?

Do we know what the fields mean?

Can we access it without heroics?

Is it fresh enough?

Does it represent the decision point?

Does it include the outcome we want to predict or improve?

Are identifiers stable enough to join across systems?

Can we reproduce the training set later?

Do we have a way to monitor drift?

Data readiness has three layers.

The first is availability. The core tables, events, documents, interactions, outcomes, and operational records need to exist somewhere. If the use case depends on customer behavior, support interactions, transactions, survey responses, claims, orders, contracts, or product usage, the team should be able to list the sources and owners.

The second is semantics. "Active customer," "churn," "high value," "resolved case," "on time," "complaint," "fraud," and "NPS" need definitions. If different teams define the same term differently, the model will inherit the conflict.

The third is platform readiness. The organization needs enough warehouse, lakehouse, orchestration, access control, lineage, and deployment capability to run the work repeatedly. That might be BigQuery and Dataform, Snowflake and dbt, Databricks and workflows, Airflow, Dagster, or another stack. The specific tools matter less than whether the data path is governed and repeatable.

If those capabilities do not exist, the first AI project may be "modernize for AI."

That is not a failure.

It is honesty.

Readiness does not mean perfection#

Teams sometimes use data readiness as a way to delay forever.

That is not the goal.

You do not need perfect data to begin. You need enough data to test a narrow claim, and you need to know where the weak spots are.

For a first pilot, document:

  • sources available now
  • sources missing
  • fields with high missingness
  • joins that are unstable
  • target definition risks
  • privacy or consent limits
  • freshness gaps
  • manual workarounds
  • data fixes required before scale

This turns data quality into a backlog instead of a vague objection.

The phrase I like is "good enough for a first model, explicit enough for a roadmap."

Prioritize use cases like a portfolio#

Do not let the loudest stakeholder win.

Score use cases by:

  • business value
  • data feasibility
  • workflow feasibility
  • risk
  • time to impact
  • change load
  • measurement clarity

High-value, low-feasibility ideas belong on the roadmap, not necessarily in the first sprint.

Low-value, high-feasibility ideas may be useful for learning, but they should not consume the strategy.

The first 1 to 3 use cases should be thin slices:

  • narrow scope
  • clear user
  • visible decision
  • measurable outcome
  • real workflow integration
  • pilotable in 3 to 6 months

Examples:

  • A call center triage assistant for one queue.
  • A churn risk score for one high-value segment.
  • A document summarization workflow for one legal review pattern.
  • A data quality anomaly detector for one critical pipeline.
  • A next-best-action model for one retention campaign.

The point is not to solve the whole enterprise in one move.

The point is to prove that the organization can turn AI into changed behavior.

Work backward from the workflow#

Architecture should follow the decision.

Ask:

Who uses the output?

Where do they use it?

How often?

What action changes?

What happens if the model is wrong?

Can the user override it?

How will feedback return?

Does the decision need batch scoring, online inference, search, retrieval, a copilot, an API, an agent, or a plain dashboard?

A propensity model might only need weekly batch scoring joined into a CRM or marketing platform.

A fraud or eligibility decision might need online scoring with strict latency and audit requirements.

A support copilot might need retrieval-augmented generation, source citations, prompt logging, human feedback, and safety filters.

An agent workflow might need MCP tools, API boundaries, user approval, and context management.

The wrong architecture can make a good model useless.

This is why I connect AI strategy to API Design for MCP Server Boundaries. If AI systems will act through tools, APIs, and enterprise systems, the integration contracts become part of the strategy.

Start with baselines#

Baseline first.

Always.

For predictive use cases, start with rules, logistic regression, decision trees, gradient boosting, or another simple model that can be explained and evaluated quickly.

For GenAI use cases, start with prompting, retrieval, policy rules, and human review before fine-tuning.

For agentic workflows, start with one or two narrow tools before building a complex multi-agent system.

Sophistication should be earned.

The first baseline tells you whether the data contains signal, whether the workflow can use the output, and whether the metric can move. If a simple model cannot be operationalized, a complex model will usually make the failure more expensive.

This does not mean advanced AI is unnecessary.

It means advanced AI should solve the problem that remains after the simple version teaches you something.

For predictive AI, define the label carefully#

The label is where many ML projects quietly break.

Churn is not just "customer left." It needs a time window and a decision point.

NPS is not just a score. It has survey timing, response bias, customer segment, and operational context.

Fraud is not just a flag. It has investigation lag and false positive cost.

SLA breach is not just an event. It depends on start time, stop time, exclusions, and policy.

The model should only train on data available at the time the decision would have been made.

No future leakage.

No labels that depend on post-action behavior unless that is explicitly modeled.

No target definitions that change halfway through history.

No silent exclusions that make the training data easier than reality.

The posts Plain-Language Machine Learning Metrics for Real Decisions and A scikit-learn Pipeline for Calibrated Decisions go deeper on the modeling side. The strategy point is simpler:

If the target is wrong, the model can be impressive and still useless.

For GenAI, RAG is often the first serious move#

Fine-tuning is tempting because it sounds like ownership.

For many enterprise knowledge use cases, retrieval-augmented generation is the better first move.

Use RAG when the model needs access to changing documents, policies, runbooks, catalogs, contracts, tickets, or product knowledge. Keep that knowledge in an index that can be updated, permissioned, evaluated, and cited.

Do not treat RAG as "dump documents into a vector store."

The strategy needs:

  • source inventory
  • permission model
  • chunking plan
  • metadata strategy
  • retrieval evaluation
  • answer evaluation
  • citation requirements
  • freshness expectations
  • fallback behavior

The post Context Engineering Keeps Long Context Useful covers why context quality matters. Long windows do not remove the need to select information carefully. They make information hygiene more important.

For agents, tools are operating permissions#

Agent strategy should be even more cautious.

An agent is not only a chat interface. It can call tools, read resources, write records, trigger workflows, and change state.

That means the strategy must define:

  • what the agent can read
  • what the agent can write
  • what requires user approval
  • what is logged
  • what is reversible
  • what is forbidden
  • which APIs enforce the boundary
  • which errors are safe to expose

I wrote about this in Codex Plugins Extend Agents, Not Interfaces and ADK Agent Memory Is an Operating Boundary. The lesson is the same: agent capability is system access. Treat it that way.

An AI strategy that ignores permissions is not a strategy.

It is a risk backlog.

Prove value with real users#

A pilot is not successful because the model runs.

It is successful if it changes behavior and improves the metric named at the beginning.

That requires a controlled test.

Use a holdout, A/B test, phased rollout, shadow mode, or champion-challenger comparison depending on the use case.

Track:

  • business metric
  • operational metric
  • quality metric
  • adoption metric
  • override rate
  • user feedback
  • risk events
  • cost

For example:

A support copilot should not only be judged by answer quality in a sandbox. It should be judged by handle time, agent adoption, edit rate, customer satisfaction, escalation rate, policy violations, and cost per conversation.

A churn model should not only be judged by AUC. It should be judged by lift in the contacted segment, incremental retention, outreach cost, false positive burden, calibration, and whether the team actually used the score.

If the pilot does not change the workflow, model quality is mostly academic.

Production means MLOps and LLMOps#

Once a pilot proves value, the work becomes industrial.

Minimum production capabilities:

  • versioned code
  • versioned data schemas
  • versioned prompts or model configs
  • reproducible training or evaluation
  • artifact registry
  • deployment pipeline
  • staging and production separation
  • monitoring
  • rollback
  • access control
  • incident response

For predictive ML, monitor:

  • feature drift
  • label drift
  • prediction distribution
  • calibration
  • model performance when labels arrive
  • business KPI movement
  • latency and failures

For LLM systems, monitor:

  • retrieval quality
  • answer quality
  • citation quality
  • hallucination rate
  • refusal behavior
  • policy violations
  • latency
  • cost per task
  • tool errors
  • user feedback

For agent systems, monitor:

  • tool calls
  • tool failure rate
  • permission denials
  • approval rate
  • unsafe attempts
  • task completion
  • human intervention
  • state changes

At that point, the AI strategy becomes an operating model.

Without it, pilots rot.

Governance should be designed early#

Governance is not where AI goes to die.

Bad governance can do that. Good governance keeps success from turning into a regulatory, ethical, or operational mess.

Every organization needs an AI use-case inventory:

  • owner
  • purpose
  • users
  • affected population
  • data sources
  • model type
  • vendor dependencies
  • risk level
  • monitoring plan
  • approval status
  • review date

Higher-risk use cases need more scrutiny:

  • credit
  • hiring
  • healthcare
  • insurance
  • legal decisions
  • adverse customer actions
  • surveillance
  • identity
  • safety-critical workflows

Define what is allowed, what requires review, and what is forbidden.

Define human-in-the-loop requirements.

Define data retention.

Define vendor controls.

Define appeal and override paths where decisions affect people materially.

This connects to Principle Stacks Make Trade-offs Explicit. AI governance is much easier when the organization knows what outranks what. Safety, trust, customer value, speed, and cost discipline will collide. The stack should decide the default.

Ownership cannot be vague#

AI strategy fails when everyone owns it.

That means nobody owns it.

Each serious AI use case needs a durable product team, even if small:

  • business owner for the KPI
  • product owner for the workflow
  • data engineering owner for pipelines
  • ML or AI engineer for model/evaluation
  • platform owner for deployment and observability
  • risk/compliance partner for higher-risk cases
  • change management owner for adoption

Committees can govern a portfolio.

They cannot operate a product.

The team closest to the workflow must own whether the AI system changes behavior. The technical team must own whether the system is reliable and measurable. Leadership must own the decision to scale, pause, or kill the effort.

A worked example: support escalation triage#

Here is the kind of example I would use to test whether the strategy is real.

A support organization wants AI because escalation queues are slow. The first request is a copilot that summarizes tickets and recommends the next action. That sounds reasonable, but it is still too broad.

The business lever is not "use AI in support." The lever is reducing avoidable escalation time without increasing policy violations or customer frustration.

Strategy layerConcrete decision
Business leverReduce avoidable escalation time for repeat issue classes
Workflow ownerSupport operations owns the triage queue
Data readinessTicket history, policy docs, escalation labels, and resolution outcomes must be joined
Pilot sliceThree issue classes with enough historical volume and clear policy boundaries
EvaluationCorrect route, cited evidence, abstention on missing policy, handle-time impact
GovernanceHuman approval required for refund, compliance, and account-risk actions
DeploymentAgent drafts recommendation inside existing support tool
Operating metricp50 triage time, escalation precision, edit rate, policy violation rate, cost per resolved case

The first version I would reject is the demo-first version: summarize any ticket, recommend anything, and let the model impress the room. That proves almost nothing.

The thin-slice version is narrower and more useful. Pick three issue classes where historical decisions are recoverable. Build a golden set. Measure whether the system routes correctly, cites the right policy, and abstains when evidence is missing. Put the draft recommendation in front of reviewers instead of customers. Watch edit rate and override reasons.

A real AI strategy loop starts with the business lever and returns to operating metrics after deployment.

The tradeoff is slower scope expansion. The narrow pilot may feel less impressive than a broad support copilot. I would accept that tradeoff because a narrow pilot can answer the only question that matters: did the system change support behavior without increasing risk?

The metric that would change my mind is edit rate. If reviewers rewrite most recommendations, the model may still be useful as a retrieval aid, but it is not ready to own recommendation quality. If edit rate is low, escalation precision holds, and policy violations stay at zero in the pilot, then the next issue class becomes a rational expansion.

The repeatable loop#

I would run AI strategy as a loop:

  1. Diagnose
  2. Prioritize
  3. Pilot
  4. Scale
  5. Govern and improve

Diagnose:

Name the business levers, data readiness, platform gaps, workflow constraints, and governance risks.

Prioritize:

Score use cases by value, feasibility, risk, time to impact, and measurement clarity.

Pilot:

Build a thin slice with a real user, real data, real workflow, and real metric.

Scale:

Industrialize the winners with MLOps, LLMOps, integration, monitoring, training, and ownership.

Govern and improve:

Monitor drift, cost, quality, adoption, risk, and business impact. Retrain, revise, or retire systems that stop earning their keep.

Each cycle ends with one question:

Did we move the business metric we named at the start?

If yes, scale carefully.

If no, fix the data, fix the workflow, revise the use case, or stop.

Stopping is strategy too.

The executive version#

For leadership, I would compress the whole thing into five statements.

AI strategy starts with measurable business levers, not model selection.

Data readiness determines which use cases are real now and which are roadmap items.

Thin-slice pilots prove value by changing workflow behavior, not by impressing a demo room.

Production AI requires MLOps, LLMOps, monitoring, governance, and clear ownership.

The only AI systems worth scaling are the ones that move the metric they were built to move.

That is the standard.

It is simple enough to say in a meeting.

It is hard enough to keep teams honest.

Back to all writing
On this page
  1. Start with business levers, not use cases
  2. The decision is the unit of strategy
  3. Data readiness is not optional
  4. Readiness does not mean perfection
  5. Prioritize use cases like a portfolio
  6. Work backward from the workflow
  7. Start with baselines
  8. For predictive AI, define the label carefully
  9. For GenAI, RAG is often the first serious move
  10. For agents, tools are operating permissions
  11. Prove value with real users
  12. Production means MLOps and LLMOps
  13. Governance should be designed early
  14. Ownership cannot be vague
  15. A worked example: support escalation triage
  16. The repeatable loop
  17. The executive version
  18. Related notes