Tag: agents

25 entries tagged "agents" — 13 posts, 12 links.

Posts

Why the May 2026 state-machine story is not finite automata becoming fashionable again, but durable execution becoming the reliability boundary for agents, workflow engines, and document-heavy product systems.

Outcome: Reader can distinguish statechart formalism from durable execution, pick the right runtime for agents and long-running workflows, and model document-heavy product lifecycles with explicit states, transitions, checkpoints, and ownership.

Why custom LLM logging leaves you flying blind in production, and how OpenTelemetry's GenAI semantic conventions turn every model call, tool invocation, and agent step into a traceable, cost-accountable span.

Outcome: Reader can instrument an LLM pipeline or agent workflow with OTEL GenAI conventions, export spans and cost metrics to any compatible backend, and build alerts on real token spend and latency instead of inferring from flat logs.

Why standard code review misses capability escalation in skill manifests, and how to wire a pre-merge conftest policy gate and post-merge SLSA provenance chain that actually work — correcting three common mistakes in the recipes that circulate online.

Outcome: Reader can wire a working pre-merge OPA/conftest gate on skill manifests, add a correct post-merge SLSA L2 provenance workflow using the SLSA GitHub Generator reusable workflow (not the nonexistent slsa CLI), and align OTel instrumentation with the GenAI semantic conventions.

Apr 28, 20269 min — Platform & AI

What ADK 2.0 Adds, and Where the Approval Path Still Breaks

Why an ADK 2.0 ToolConfirmation flow paired with VertexAiSessionService re-presented the same approval to a reviewer on Monday morning and ran the tool twice, and what the gap tells you about how to evaluate harness primitives at different maturity levels.

Outcome: Reader can map ADK 2.0 primitives onto a session-service backing store and decide which combinations are production-ready, which are beta-with-known-gaps, and which require waiting.

Apr 18, 202613 min — Platform & AI

Treat Agent Skills Like Supply-Chain Dependencies

A repo-ready operating contract for agent skills that prevents prompt bundles from drifting into unsigned, over-permissioned, unreviewed production dependencies.

Outcome: Defined a hardened-by-default skill contract covering version pins, manifest provenance, prompt review, IO tests, least-privilege tools, runtime isolation, observability, rotation, and decommissioning.

Feb 24, 20266 min — Platform & AI

DSPy + RAG Evaluation Ops in Production

How to turn DSPy and RAG evaluation into a production release loop with golden sets, retrieval checks, generation rubrics, regression thresholds, and versioned prompt programs.

Outcome: Promoted the note into an essay by defining a repeatable RAG evaluation workflow that separates retrieval quality from generation quality and blocks prompt-program regressions before release.

Feb 10, 20266 min — Platform & AI

Evaluating Multi-Agent Workflows for Enterprise Reliability

A practical evaluation loop for multi-agent workflows that catches demo-friendly failures in task handoff, tool use, permissions, latency, and completion criteria before release.

Outcome: Established a repeatable evaluation workflow that gates multi-agent releases on task completion, handoff quality, tool correctness, latency, and recoverability instead of demo impressions.

Jan 27, 202618 min — Platform & AI

Local MCP and Private Open Model Infrastructure

A practical guide to running MCP servers locally, choosing affordable clients, and deploying private open models with Cloud Run, Ollama, and Open WebUI.

Outcome: Separated local agent tool access from private model serving, then defined a safer setup for MCP clients, local servers, and Cloud Run GPU sidecars.

Nov 20, 202514 min — Platform & AI

Agent Memory Is an Operating Boundary

A practical look at Google ADK memory, Vertex AI Memory Bank, session state, retrieval, retention, access control, and why durable agent memory needs production discipline.

Outcome: Clarified the difference between short-term session state and durable agent memory, then mapped the operational risks around retrieval, security, retention, cost, and memory poisoning.

Nov 16, 20258 min — Platform & AI

The Question About Your AI Agent Has Changed

Capability is no longer the hard question about AI agents. What the agent is permitted to do, and whether it will do it successfully, are. Here is why that distinction matters architecturally.

Outcome: Reframed agent deployment decisions around permission scope and blast radius rather than capability, reducing the risk of production failures from over-permissioned agentic systems.

Nov 12, 202513 min — Platform & AI

Codex Plugins Extend Agents, Not Interfaces

Why Codex plugins point toward a different software design mindset: fewer UI extensions, more safe agent capabilities, system access points, and operational boundaries.

Outcome: Framed plugins as reusable agent capability bundles that require structured systems, permissions, predictable workflows, and safer operational surfaces.

Links

All tags