The Question About Your AI Agent Has Changed

Outcome focus: Reframed agent deployment decisions around permission scope and blast radius rather than capability, reducing the risk of production failures from over-permissioned agentic systems.

I have mostly stopped asking whether an AI agent can do something.

The question used to be worth asking. A year ago it was genuinely unclear whether an agent could reliably complete a multi-step research workflow, write and execute code against a live database, coordinate with other agents to handle a compliance review, or take action inside a third-party API on a user's behalf. Those were open questions. They are mostly closed now, at least at the capability level. The models are good enough. The tooling exists. The answer to "can it do this?" is almost always yes.

What I ask now is different: what is the agent permitted to do? What specifically is it not permitted to do? And separate from both of those — will it do what it is supposed to do consistently, or will it succeed in pilots and degrade in production?

Those are harder questions, and they are the ones that actually determine whether an agentic system is worth deploying.

The capability assumption has flipped#

The field has moved faster than most organizations' thinking about it. Teams are still having the "can it?" conversation while the more consequential conversation — "should it, and under what conditions?" — is being skipped entirely.

The empirical data on this is uncomfortable. In a March 2026 survey of enterprise AI deployments, 78% of organizations had AI agent pilots running. Fourteen percent had reached production scale. The gap between those numbers is not a capability problem. Agents in pilots are demonstrably capable of the tasks they are being evaluated on. The gap is operational — integration with legacy systems, output degradation on edge cases, missing monitoring infrastructure, unclear ownership between business and technical teams, and no governance model that answers what the agent is actually authorized to do when things get ambiguous.

The organizations that did make it to production scale shared one differentiator: they invested proportionally more in evaluation infrastructure, monitoring, and operational governance — not in model selection. The question they were answering was not "what can this agent do?" It was "what is this agent authorized to do, how do we know when it is doing it wrong, and how do we stop it when it is?"

What "permitted" means architecturally#

Permission is not just a policy document. It is an architectural property of the system, and it has a measurable consequence when it is too broad.

The term that captures this consequence is blast radius. The formula is straightforward: blast radius equals access scope multiplied by operating velocity multiplied by detection window. An agent with broad access, running thousands of operations per hour, with no monitoring that would trigger an alert — that is not a risk. That is an incident waiting to accumulate.

The scale difference between human error and agent error is the thing most teams do not internalize early enough. When a human makes an access decision incorrectly, the error is bounded to that interaction. When an agent operating under the same (incorrect) permissions makes the same error, it makes it at whatever rate the agent runs — continuously, without fatigue, until something stops it. One team's AI procurement agent, given inherited "full ordering authority" as a convenience, generated $47,000 in unauthorized vendor orders from a single off-by-one logic error. No breach. No attack. A logic error operating under excess permissions at machine velocity.

The OWASP Top 10 for Agentic Applications, finalized in late 2025 by over a hundred security practitioners, documents this failure mode as its third-ranked risk. An agent with legitimate scheduling access can map an entire organizational structure through inherited credentials. An agent with legitimate file-read access becomes an exfiltration tool when its reasoning is manipulated. The capability and the threat are the same object. The authorization boundary is the only thing separating them.

Why least privilege for agents is different from least privilege for systems#

Least privilege is not a new idea. It is the foundation of every serious security model. But applying it to agents is structurally different from applying it to users or services, and the difference matters.

A human user or a microservice has a defined role with a defined permission set. That permission set can be established at deploy time because the role is known in advance. An agent determines what it needs at runtime. It does not know at design time which API calls it will need to make, which files it will need to read, or which downstream services it will need to invoke. So teams over-provision permissions "just in case" — and that excess access is the primary attack surface.

Static permission models fail for agents for the same reason that giving every employee in a building a master key fails: the over-provisioning is the vulnerability, and it does not require an attack to materialize as a problem. A logic error, an unexpected edge case, or a subtly wrong instruction is enough.

The architecture that works is different: short-lived, task-scoped credentials minted at runtime for the specific operation in progress. An agent executing a data retrieval task gets read-only access to the specific dataset for the duration of that task. The session expires when the task completes. If the same agent is later executing a write task, it gets a different credential scoped to that write operation. Research cited in recent least-privilege literature shows a 92% reduction in credential-theft incidents when moving from 24-hour session tokens to 300-second task-scoped tokens. The mechanism works when the secure path is made the convenient path — if ephemeral scoping requires extra engineering effort every time, teams build workarounds that negate it.

The three questions, not one#

The original question — "can it do this?" — has mostly collapsed into a yes. But it left two questions behind it that are harder and more important.

The first is the permission question: what is the agent authorized to do, and what is it explicitly not authorized to do? This is not just a security question. It is a product design question and a regulatory question. Agents operating against business data, customer records, financial systems, or regulated workflows inherit the compliance requirements of those systems. An agent that can read and write health records is subject to HIPAA regardless of whether anyone declared it. The authorization boundary defines the compliance perimeter.

A practical framework for scoping agent action risk comes from OWASP's cheat sheet: tier agent actions by consequence. Low-risk reads and safe queries can be auto-approved. Writes and external API calls warrant review. Financial transactions and external communications require human-in-the-loop approval. Irreversible operations require mandatory human approval with documented rollback options. This is not bureaucracy. It is the structural answer to "what is this agent permitted to do?" made concrete enough to implement.

The second question is the reliability question: will the agent do what it is supposed to do, not just in the pilot, but at production volume, on edge cases, when the input distribution shifts, when the tool it depends on responds unexpectedly? The answer to this question is almost always "not as reliably as the pilot suggested." Edge cases that account for two percent of inputs in a controlled evaluation become hundreds of incorrect outputs per day at production scale. Without monitoring infrastructure that surfaces those failures as they accumulate rather than weeks later when the first complaint arrives, the production deployment is running blind.

Anthropic's own guidance on building reliable agents is direct on this point: agents should only be used for problems where the required number of steps cannot be predicted in advance. For well-defined tasks with fixed paths, deterministic systems are preferable — and this comes from the organization that builds the models. For tasks where agents are appropriate, the design must include explicit stopping conditions, sandboxed testing, human checkpoints, and environmental verification at each step. These are not optional additions. They are what separates a production system from a demo.

What this changes about how you build#

The capability question was mostly about the model layer. The permission question and the reliability question are about the infrastructure layer, the governance layer, and the operational layer.

Sixty percent of organizations currently cannot terminate a misbehaving agent in production. That means for most teams deploying agents today, the detection window in the blast radius formula is structurally open-ended — they can reduce access scope, but they cannot reliably close the time window between failure onset and containment. In that environment, the only lever that actually works is making the access scope as narrow as possible before the agent is ever deployed.

Microsoft's Azure AI team recently published a framework that gates agent deployment decisions before the architecture question even comes up. If every valid output for every valid input can be enumerated in unit tests, do not use an agent. If the domain requires hallucination guarantees, sub-100ms latency, or regulatory explainability of individual decisions, do not use an agent. The governance answer in those cases is no — not "build more guardrails," just no. For the cases where agents are the right tool, the framework requires audit-grade observability, human review gates, immutable audit logs, and evaluation pipelines as prerequisites to production deployment, not retrofits after the fact.

Internal data from that team shows that organizations with governance tools in place before scaling get twelve times more AI projects into production than those without. Governance is not a constraint on deployment. It is the mechanism that makes deployment possible at scale without the incident rate that forces rollbacks.

The question I ask now#

When someone brings me an agent design, I am no longer primarily interested in what it can do. I want to know what its minimum viable permission set is — not what it might need eventually, but what it specifically needs to complete the defined task. I want to know what the blast radius looks like if a logic error propagates for four hours before detection. I want to know what the stopping condition is and whether it is enforced in the system or just documented in a design doc. I want to know what the monitoring produces and who acts on it.

These are not hostile questions toward the agent design. They are the questions that determine whether the agent design survives contact with production. The models can do what we claim. The architecture around them is what determines whether that capability translates into something a business can rely on.

The hard question is not capability. It never was, for very long.