Platform & AI Engineering

Software, data, and AI systems built with a product manager's discipline — GCP architecture, BigQuery, Dataform governance, ML pipelines, agent workflows, and the evaluation that makes them trustworthy in production.

Jun 18, 2026 — 8 min — Platform & AI

Agent Frameworks Are Infrastructure Now

A practical 2026 map of AI agent frameworks by infrastructure primitive: orchestration, tools, state, multi-agent delegation, approvals, observability, evaluation, and deployment.

Outcome: Reframed agent-framework selection away from tier lists and toward an operating contract for which primitives the framework owns, which ones the team must own, and where bespoke orchestration is still justified.

agents agent frameworks mcp orchestration observability

Jun 16, 2026 — 6 min — Platform & AI

Free-Threaded Python Changes the Concurrency Question

A decision map for when Python teams should keep asyncio, use multiprocessing, try subinterpreters, or pilot free-threaded CPython without importing new race conditions.

Outcome: Separated Python concurrency choices by workload, state sharing, dependency readiness, and failure mode so free-threaded builds become a measured pilot instead of a default switch.

python free-threading concurrency performance systems architecture

Jun 16, 2026 — 7 min — Platform & AI

Python Architecture for AI and Data Systems in 2026

A Python architecture map for AI, data, and backend teams that need notebooks, prompts, evaluations, services, repositories, and infrastructure to stop collapsing into one folder.

Outcome: Defined a production Python layout for AI and data systems that separates experimentation, evaluation, domain logic, infrastructure adapters, and deployable service code.

python ai engineering data engineering software architecture evaluation

Jun 15, 2026 — 7 min — Platform & AI

Measure Python Performance Before You Change the Code

A Python performance playbook for choosing data structures, profiling tools, vectorized libraries, JIT experiments, and concurrency changes from evidence instead of taste.

Outcome: Provided a performance triage workflow that asks for a measured bottleneck before changing algorithms, data structures, runtime flags, concurrency models, or native libraries.

python performance profiling data engineering systems

Jun 15, 2026 — 6 min — Platform & AI

The Python Project Skeleton I Want Before the First Feature

A production Python project skeleton that prevents import confusion, dependency drift, and toolchain sprawl before the first API route or model workflow ships.

Outcome: Specified a repo baseline with src layout, uv locking, Ruff, typed checks, pytest, dependency groups, and CI gates so Python projects begin with executable architecture.

python project architecture uv ruff ci

Jun 14, 2026 — 7 min — Platform & AI

The 2026 Python Operating Standard Is Boring on Purpose

A practical Python 3.13+ operating standard for teams that need typed, readable, measurable systems without mistaking every new interpreter feature for a production default.

Outcome: Turned a pile of Python 3.13, 3.14, and 3.15-era advice into an adoption contract: stable defaults now, measured pilots for runtime changes, and automated gates before production.

python software architecture engineering standards typing quality gates

Jun 14, 2026 — 7 min — Platform & AI

Typing Turns Python Architecture Into a Contract

How to use modern Python typing, protocols, dataclasses, and payload types to stop raw dictionaries from becoming the hidden architecture of a production system.

Outcome: Gave Python teams a boundary pattern for converting untrusted payloads into typed domain objects before service logic, repositories, and agent tools can depend on them.

python typing domain modeling software architecture api design

Jun 13, 2026 — 6 min — Platform & AI

Pythonic Code in 2026 Is Explicit at the Boundaries

A modern Python style guide for choosing clear comprehensions, explicit None checks, pattern matching, t-strings, and domain exceptions where they improve system behavior.

Outcome: Converted Pythonic style advice into boundary rules for payload dispatch, template rendering, exception design, and readable transformations in production code.

python code quality pattern matching error handling software engineering

May 14, 2026 — 17 min — Platform & AI

dbt on BigQuery Ingestion, Snapshots, and Cost Gates

A dbt on BigQuery starter kit for the parts that usually fail after the demo: raw loads without partition filters, snapshots with weak change detection, and CI that lets expensive SQL promote.

Outcome: Reader can scaffold a dbt and BigQuery project with manifest-backed incremental loads, timestamp-first snapshots, partitioned models, and a dry-run bytes gate before production promotion.

dbt bigquery data engineering analytics engineering ci cost controls

May 8, 2026 — 14 min — Platform & AI

Data Governance with AI in 2026: A Current Map for Operators

Half the 2025 AI-governance recipes still in production cite documents that were rescinded, delayed, or replaced in the last twelve months. The current map: what got retired, what's still authoritative, and what an operating governance program actually has to cover in 2026.

Outcome: Reader can audit their AI/data governance program against the actual 2026 regulatory and standards stack — including the federal rescissions, the EU AI Act timeline shift agreed May 7, 2026, the ISO/IEC 5259 Part 5 publication, and the OWASP Agentic Top 10 — and retire stale references with confidence.

data governance ai governance compliance iso 42001 nist ai rmf eu ai act owasp c2pa policy as code

May 3, 2026 — 7 min — Platform & AI

LLM and Agent Observability with OpenTelemetry GenAI Conventions

Why custom LLM logging leaves you flying blind in production, and how OpenTelemetry's GenAI semantic conventions turn every model call, tool invocation, and agent step into a traceable, cost-accountable span.

Outcome: Reader can instrument an LLM pipeline or agent workflow with OTEL GenAI conventions, export spans and cost metrics to any compatible backend, and build alerts on real token spend and latency instead of inferring from flat logs.

observability opentelemetry llms agents monitoring mlops ai engineering

May 2, 2026 — 12 min — Platform & AI

Agent Repo Trust Gates: Conftest Policies, SLSA Provenance, and SBOM in GitHub Actions

Why standard code review misses capability escalation in skill manifests, and how to wire a pre-merge conftest policy gate and post-merge SLSA provenance chain that actually work — correcting three common mistakes in the recipes that circulate online.

Outcome: Reader can wire a working pre-merge OPA/conftest gate on skill manifests, add a correct post-merge SLSA L2 provenance workflow using the SLSA GitHub Generator reusable workflow (not the nonexistent slsa CLI), and align OTel instrumentation with the GenAI semantic conventions.

supply chain github actions slsa conftest opa sbom agents ci/cd security

May 2, 2026 — 13 min — Platform & AI

Comprehension Debt: When Code Ships Without Theory

Why a two-day debug session on a one-month-old AI-generated bug is not a debugging problem but a theory-building problem you skipped, and the operating discipline that makes the missing theory recoverable.

Outcome: Reader has a working definition of comprehension debt distinct from technical debt, three questions to test whether a theory exists for an AI-generated component, a PR comprehension scoring rubric, and a deliberate-practice tactic set that prevents the doom loop.

ai coding assistants systems thinking technical debt comprehension developer workflow code review cognitive load

Apr 30, 2026 — 8 min — Platform & AI

The Go and gRPC Version of the SaaS Stack

When a SaaS product should graduate from a flexible Python-first backend into Go, gRPC, Cloud Run, and Google Cloud service boundaries.

Outcome: Mapped a Go and gRPC adoption path for SaaS teams that need stronger service contracts, concurrency, latency discipline, and Google Cloud operations without premature rewrites.

gcp go grpc cloud run software architecture

Apr 29, 2026 — 10 min — Platform & AI

BigQuery Keys in dbt Are Optimizer Hints, Not Enforcement

How to use BigQuery primary and foreign key constraints from dbt without confusing optimizer metadata for enforced data integrity.

Outcome: Defined a BigQuery and dbt constraint playbook that keeps optimizer hints, dbt contracts, data tests, compiled SQL review, and INFORMATION_SCHEMA verification in the right order.

bigquery dbt data contracts analytics engineering query optimization data quality

Apr 29, 2026 — 15 min — Platform & AI

Your Repo Needs an Agent Harness, Not More Prompt Paste

A critical guide to README.md, AGENTS.md, CLAUDE.md, SKILL.md, .agents, and .claude patterns for teams that want coding agents to follow repo rules without stuffing every workflow into one giant prompt.

Outcome: Defined a repo documentation harness that separates human orientation, always-loaded agent rules, tool-specific compatibility files, on-demand skills, dynamic docs, and deterministic enforcement.

coding agents agent skills agents.md claude code developer workflow

Apr 28, 2026 — 9 min — Platform & AI

What ADK 2.0 Adds, and Where the Approval Path Still Breaks

Why an ADK 2.0 ToolConfirmation flow paired with VertexAiSessionService re-presented the same approval to a reviewer on Monday morning and ran the tool twice, and what the gap tells you about how to evaluate harness primitives at different maturity levels.

Outcome: Reader can map ADK 2.0 primitives onto a session-service backing store and decide which combinations are production-ready, which are beta-with-known-gaps, and which require waiting.

agents adk agent harness human in the loop memory

Apr 28, 2026 — 11 min — Platform & AI

Why I Reach for DuckDB When Reading Parquet from Swift or Zig

What an oversized iOS binary, a Linux linker error, and a SQL boundary teach about embedding DuckDB as the Parquet reader for languages without a mature native library.

Outcome: Reader can decide when DuckDB is the right Parquet path for a Swift or Zig project, configure the SPM and build.zig integrations correctly the first time, and avoid the binary-size and linker failures that the unconfigured path produces.

data engineering parquet duckdb swift zig

Apr 26, 2026 — 10 min — Platform & AI

Minimal ML Examples Are Better as Review Maps Than Cheatsheets

How a compact Python ML cheatsheet becomes useful when synthetic demos, metrics, pipelines, and version drift are tied to the model-review decisions they can actually defend.

Outcome: Reader can use minimal scikit-learn examples as smoke tests for task framing, metric choice, pipeline boundaries, and environment drift instead of treating them as production recipes.

machine learning scikit-learn model evaluation mlops python

Apr 25, 2026 — 12 min — Platform & AI

Every Engineer Is a Manager Now

AI coding agents are turning software work into management work: engineers now have to manage intent, context, agent output, teammate coordination, stakeholder evidence, and long-term maintenance.

Outcome: Defined a public operating model for engineers and consultants who need to coordinate human teammates and AI agents without producing artifacts that create hidden technical debt.

ai agents engineering leadership software process technical communication consulting

Apr 24, 2026 — 9 min — Platform & AI

Reading Parquet from Elixir and Mojo Without Pretending the Runtime Is Native

Why a precompiled-NIF fall-through on a less-common Linux target adds quiet minutes to a deploy, and what the borrowed-runtime pattern actually looks like for Elixir and Mojo.

Outcome: Reader can ship Parquet-reading Elixir without surprise source compilation in CI, recognize where Mojo's Python interop boundary is the bottleneck rather than Mojo itself, and know which DataFrame guarantees leak at the BEAM and PyArrow boundaries.

data engineering parquet elixir mojo deployment

Apr 22, 2026 — 8 min — Platform & AI

Building an NPS Classifier You Can Actually Act On

A scikit-learn NPS ordinal classifier with SMOTE, probability calibration, utility-based thresholding, and PSI drift checks. The parts that make it useful to the retention team, not just accurate on a dashboard.

Outcome: Shipped a calibrated multiclass NPS model with a utility-driven operating threshold and a PSI-based drift loop, giving the retention team a per-customer detractor probability they can act on and a rule for when to retrain.

ml nps classification calibration drift evaluation

Apr 21, 2026 — 14 min — Platform & AI

Coding Assistants Work Best When the Blast Radius Is Small

An Android-first operating pattern for using GitHub Copilot, Amazon Q Developer, Android CLI, and Android skills without letting coding assistants rewrite Gradle, manifests, architecture, and security posture by accident.

Outcome: Defined a repeatable assistant workflow for Android teams that combines sliced prompts, repo instructions, Android skills, screenshots, atomic commits, tests, and GHAS gates into one controlled development loop.

coding assistants android github copilot amazon q mobile engineering

Apr 20, 2026 — 11 min — Platform & AI

How I Read Parquet in Rust and Go Without an OOM

Why a default Go parquet.Read[T] call slurped a 1.4 GB file into 11 GB of resident memory, and the column-native Rust and Go patterns that replaced it.

Outcome: Reader can pick the streaming Parquet read path in Rust and Go, configure the compression-codec features explicitly, and avoid the eager-load anti-patterns that look fine in benchmarks and break in production.

data engineering parquet rust go memory safety

Apr 18, 2026 — 13 min — Platform & AI

Treat Agent Skills Like Supply-Chain Dependencies

A repo-ready operating contract for agent skills that prevents prompt bundles from drifting into unsigned, over-permissioned, unreviewed production dependencies.

Outcome: Defined a hardened-by-default skill contract covering version pins, manifest provenance, prompt review, IO tests, least-privilege tools, runtime isolation, observability, rotation, and decommissioning.

agent skills supply chain agents security developer workflow

Apr 17, 2026 — 15 min — Platform & AI

AI Coding Assistants Expose Process Debt

Why teams using Claude, GPT-style coding agents, Cursor, and Copilot often get unstable app work when requirements, versions, conventions, tests, and handoffs are implicit.

Outcome: Defined a docs-first assistant workflow that turns requirements, pinned stack choices, task slices, review loops, tests, and Git checkpoints into a repeatable way to ship with AI without surrendering architecture control.

coding assistants software process ai agents developer workflow technical leadership

Apr 14, 2026 — 4 min — Platform & AI

What AI Researchers Do That I Do Not

A short, honest read on what AI researchers actually do day to day, written from outside the role by an applied engineer who reads papers when the work demands it.

Outcome: Reader can distinguish AI research work from applied AI engineering work, decide which research outputs change their quarter and which do not, and avoid hiring or being hired against the wrong role description.

ai engineering ai research engineering discipline career

Apr 1, 2026 — 19 min — Platform & AI

A Software Architecture Reading Path for Working Engineers

A practical reading path through software design, architecture, system design interviews, data-intensive applications, and systems analysis for engineers who want to grow beyond implementation.

Outcome: Reviewed the architecture and system design books from the DEV Community list, corrected the list count, summarized each book, and arranged them into a practical learning path.

software architecture software engineering systems design reading list engineering growth

Mar 20, 2026 — 8 min — Platform & AI

Fine-Tuning GPT-OSS 20B on a 64GB MacBook Pro

A practical MLX-first recipe for experimenting with openai/gpt-oss-20b on a 64GB Apple Silicon Mac without confusing local LoRA work for CUDA-scale training.

Outcome: Defined a local 64GB MacBook Pro fine-tuning path for GPT-OSS 20B that prioritizes Harmony formatting, MLX quantized LoRA, small evals, and a clear fallback to NVIDIA when scale is required.

gpt-oss mlx apple silicon llm fine-tuning local ai

Mar 16, 2026 — 7 min — Platform & AI

Fine-Tuning LLMs on a MacBook Pro With MPS and MLX

Why Apple Silicon is useful for local LLM prototyping and LoRA experiments, but still has sharp boundaries compared with CUDA-scale NeMo or Hugging Face training.

Outcome: Separated Mac-local MPS and MLX fine-tuning paths from NVIDIA-only training features so local experiments can start with realistic hardware expectations.

apple silicon mlx pytorch mps llm fine-tuning

Mar 12, 2026 — 9 min — Platform & AI

The Faster Transformers Stack Behind GPT-OSS

Why Hugging Face's faster Transformers work matters beyond GPT-OSS, and how kernels, MXFP4, parallelism, KV cache, batching, and model loading change practical LLM runtime decisions.

Outcome: Mapped the GPT-OSS-era Transformers runtime features into concrete decisions about memory, compute, cache behavior, batching, and serving boundaries.

transformers gpt-oss hugging face inference model performance

Mar 8, 2026 — 10 min — Platform & AI

Fine-Tuning LLMs Is an Operating Loop, Not a Training Command

Why LLM fine-tuning projects fail when teams jump to NeMo or Hugging Face training commands before deciding the model, data, evaluation, serving, and governance loop.

Outcome: Defined a fine-tuning operating loop that connects base-model choice, data curation, PEFT, evaluation, distributed training, serving, and governance into one repeatable release path.

llm fine-tuning nemo hugging face peft llmops

Mar 4, 2026 — 11 min — Platform & AI

NVFP4 and the Infrastructure Meaning of Precision

A grounded read of NVIDIA's NVFP4 training post and why 4-bit pretraining matters for model quality, token throughput, cost, and AI infrastructure strategy.

Outcome: Explained NVIDIA's NVFP4 training recipe, separated the credible technical signal from the marketing surface, and connected low-precision training to practical AI infrastructure decisions.

llm training nvidia quantization model efficiency ai infrastructure

Feb 28, 2026 — 15 min — Platform & AI

Context Engineering Keeps Long Context Useful

A practical synthesis of Drew Breunig, Simon Willison, and Anthropic on how long contexts fail, how to fix them, and why multi-agent systems need context discipline.

Outcome: Turned long-context failure modes into an engineering playbook for selecting, isolating, pruning, summarizing, offloading, and evaluating context in agent systems.

agents context engineering llm evaluation tool use multi-agent systems

Feb 24, 2026 — 20 min — Platform & AI

From Algorithms to AI Systems

A practical map from algorithmic complexity to software engineering, data pipelines, machine learning systems, and modern LLM architecture decisions.

Outcome: Connected classical algorithm analysis to production software, ML pipelines, RAG systems, model serving, and the trade-offs behind modern AI research.

algorithms machine learning llm systems data engineering systems design

Feb 24, 2026 — 6 min — Platform & AI

DSPy + RAG Evaluation Ops in Production

How to turn DSPy and RAG evaluation into a production release loop with golden sets, retrieval checks, generation rubrics, regression thresholds, and versioned prompt programs.

Outcome: Promoted the note into an essay by defining a repeatable RAG evaluation workflow that separates retrieval quality from generation quality and blocks prompt-program regressions before release.

dspy rag evaluation mlops agents

Feb 10, 2026 — 6 min — Platform & AI

Evaluating Multi-Agent Workflows for Enterprise Reliability

A practical evaluation loop for multi-agent workflows that catches demo-friendly failures in task handoff, tool use, permissions, latency, and completion criteria before release.

Outcome: Established a repeatable evaluation workflow that gates multi-agent releases on task completion, handoff quality, tool correctness, latency, and recoverability instead of demo impressions.

agents evaluation reliability enterprise ai observability

Feb 4, 2026 — 18 min — Platform & AI

Machine Learning Terms That Make Model Reviews Better

A practical ML terminology guide for model reviews where feature definitions, data splits, task type, optimization behavior, overfitting risk, regularization, ensembles, and embeddings need to be discussed precisely.

Outcome: Gave peers a review-ready vocabulary for inspecting ML systems by connecting core terms to design choices, failure modes, and release questions.

machine learning model evaluation feature engineering neural networks mlops

Jan 31, 2026 — 10 min — Platform & AI

The Preprocessing Boundary Between scikit-learn and PyTorch

A production-friendly pattern for pairing scikit-learn preprocessing graphs with PyTorch models so training and inference use the same feature contract.

Outcome: Defined an artifact contract that keeps column preprocessing, feature order, model weights, metadata, and inference behavior synchronized across batch and serving environments.

machine learning pytorch scikit-learn mlops model serving

Jan 28, 2026 — 6 min — Platform & AI

Dataform + BigQuery Governance Release Patterns

A Dataform and BigQuery case study for turning data contracts, release lanes, validation gates, rollback behavior, and cost checks into one governed promotion path.

Outcome: Reduced contract-break risk in the sanitized release pattern by making schema, freshness, cost, and downstream impact checks part of promotion instead of after-the-fact review.

dataform bigquery data contracts release engineering gcp

Jan 27, 2026 — 18 min — Platform & AI

Local MCP and Private Open Model Infrastructure

A practical guide to running MCP servers locally, choosing affordable clients, and deploying private open models with Cloud Run, Ollama, and Open WebUI.

Outcome: Separated local agent tool access from private model serving, then defined a safer setup for MCP clients, local servers, and Cloud Run GPU sidecars.

mcp agents cloud run ollama open webui

Jan 19, 2026 — 18 min — Platform & AI

API Design for MCP Server Boundaries

A Confluence-ready guide for designing durable HTTP APIs and wrapping them safely as Model Context Protocol servers.

Outcome: Turned general API design guidance into a practical standard for HTTP APIs that back MCP servers, with current protocol corrections, checklists, and source links.

api design mcp agent systems software architecture platform engineering

Jan 15, 2026 — 11 min — Platform & AI

When 0.3 Does Not Mean 30 Percent

How imbalanced classifiers can keep a strong AUC while producing probabilities that break thresholds, alerts, and cost-sensitive decisions in production.

Outcome: Defined a production calibration gate that logs Brier score, ECE, reliability diagrams, cost-sensitive thresholds, run metadata, and promotion criteria for imbalanced classifiers.

ml calibration classification evaluation probability reliability

Jan 12, 2026 — 5 min — Platform & AI

Compliant GCP Platform Playbook for Analytics and ML

A sanitized GCP platform case study where compliance, analytics delivery, and ML feature access had to be designed as one release path instead of three disconnected workstreams.

Outcome: Reduced governed dataset onboarding from weeks to days in the sanitized pattern while preserving auditability, cost visibility, and promotion rules for analytics and ML use cases.

gcp bigquery governance analytics ml

Jan 11, 2026 — 12 min — Platform & AI

scikit-learn Pipelines That Survive Tuning and Deployment

Why tabular models drift between notebooks and production when preprocessing, sample metadata, hyperparameter search, and persistence are not treated as one scikit-learn pipeline contract.

Outcome: Defined a scikit-learn pipeline contract that keeps column preprocessing, metadata routing, hyperparameter search, evaluation, and deployment artifacts reproducible across dev, stage, and production.

machine learning scikit-learn mlops model persistence tabular data

Jan 7, 2026 — 20 min — Platform & AI

Statistics for Data Science, Written for Software Developers

A software-developer guide to the statistics that actually change data-science decisions: samples, estimates, uncertainty, effect size, bias, probability, distributions, and model metrics.

Outcome: Defined a practical estimate-review workflow that helps software developers report effect size, confidence intervals, p-values, sampling bias, and classification metrics without treating statistics as glossary trivia.

statistics data science machine learning model evaluation experimentation

Dec 30, 2025 — 12 min — Platform & AI

Vertex AI Feature Store Is the Production Loop

A production-focused Vertex AI post on turning raw data, BigQuery features, online feature serving, model endpoints, monitoring, and retraining into one governed ML loop instead of another platform checklist.

Outcome: Defined a concrete Vertex AI feature-serving loop with source contracts, BigQuery feature views, point-in-time training exports, endpoint serving rules, monitoring thresholds, and retraining triggers.

gcp vertex ai feature store mlops gemini

Dec 26, 2025 — 10 min — Platform & AI

Vertex AI Makes More Sense as an MLOps Map

A Vertex AI architecture map for teams that need to decide which Google Cloud AI services belong in the ML lifecycle, where ownership changes hands, and which older assumptions are now unsafe.

Outcome: Gave teams an operating contract for using Vertex AI across data, features, training, deployment, monitoring, and generative AI without confusing a product menu for a production ML system.

gcp vertex ai mlops feature store model monitoring

Dec 22, 2025 — 15 min — Platform & AI

Correlation Is a Feature Screen, Not a Feature Strategy

A long-form feature-screening workflow that uses correlation for quick linear checks, then adds redundancy clustering, mutual information, chi-squared tests, L1 models, tree importances, permutation importance, and domain review.

Outcome: Defined a practical feature review loop that prevents teams from dropping useful nonlinear signals or keeping redundant features just because a correlation heatmap looked convincing.

machine learning feature selection correlation scikit-learn model evaluation

Nov 20, 2025 — 14 min — Platform & AI

Agent Memory Is an Operating Boundary

A practical look at Google ADK memory, Vertex AI Memory Bank, session state, retrieval, retention, access control, and why durable agent memory needs production discipline.

Outcome: Clarified the difference between short-term session state and durable agent memory, then mapped the operational risks around retrieval, security, retention, cost, and memory poisoning.

agents google cloud adk memory rag

Nov 16, 2025 — 8 min — Platform & AI

The Question About Your AI Agent Has Changed

Capability is no longer the hard question about AI agents. What the agent is permitted to do, and whether it will do it successfully, are. Here is why that distinction matters architecturally.

Outcome: Reframed agent deployment decisions around permission scope and blast radius rather than capability, reducing the risk of production failures from over-permissioned agentic systems.

agents ai governance enterprise ai authorization security

Nov 12, 2025 — 13 min — Platform & AI

Codex Plugins Extend Agents, Not Interfaces

Why Codex plugins point toward a different software design mindset: fewer UI extensions, more safe agent capabilities, system access points, and operational boundaries.

Outcome: Framed plugins as reusable agent capability bundles that require structured systems, permissions, predictable workflows, and safer operational surfaces.

codex agents plugins mcp software architecture

Nov 8, 2025 — 13 min — Platform & AI

Sandboxed Agents and the Production Automation Boundary

OpenAI's April 2026 Agents SDK update matters because sandboxed execution, manifests, resumable state, and memory move agents closer to real production automation.

Outcome: Framed sandboxed agent execution as an architecture boundary for safer, stateful, long-running automation instead of another demo-layer SDK feature.

agents openai sandboxing automation enterprise ai

Nov 4, 2025 — 15 min — Platform & AI

AI Strategy Starts Before the Model

A practical AI strategy framework with a worked example that connects business levers, data readiness, pilots, evaluation, governance, deployment, and operating metrics.

Outcome: Defined an end-to-end AI strategy playbook and worked example that ties data readiness, use-case selection, model development, governance, deployment, and operating ownership to measurable business outcomes.

ai strategy data strategy mlops llmops business outcomes

Oct 31, 2025 — 14 min — Platform & AI

Cloud Run GPU Sidecars Need Deployment Discipline

A practical deployment guide for running Ollama behind Open WebUI on Cloud Run GPUs without mixing service specs, model storage modes, sidecar startup order, or auth assumptions.

Outcome: Clarified Cloud Run GPU sidecar deployment choices so model storage, service YAML, startup ordering, authentication, and billing constraints are explicit before launch.

gcp cloud run gpu ollama open webui

Oct 27, 2025 — 10 min — Platform & AI

In-Warehouse Inference on Snowflake and BigQuery

A practical runbook for scoring changed rows close to the data using Snowflake Streams and Tasks or BigQuery scheduled queries and remote models.

Outcome: Compared Snowflake and BigQuery patterns for scheduled in-warehouse inference, corrected CDC assumptions, and defined monitoring, grants, and deployment checks.

snowflake bigquery mlops inference data engineering

Oct 23, 2025 — 14 min — Platform & AI

What a Data Strategist Actually Does

A practical view of data strategy as the operating discipline that connects business goals, governance, KPIs, platforms, analytics, ML, and AI delivery.

Outcome: Connected data roadmaps, governance, KPI design, platform delivery, and stakeholder alignment so analytics and AI initiatives produced measurable business decisions.

data strategy data governance analytics gcp decision intelligence

Oct 19, 2025 — 12 min — Platform & AI

When the Model Should Say It Doesn't Know: Conformal Prediction Sets with MAPIE

How to add coverage-guaranteed prediction sets, temperature scaling calibration, and risk-coverage curves to a classifier using MAPIE — the pieces that make uncertainty quantification operationally useful rather than decorative.

Outcome: Added coverage-guaranteed prediction sets and operational abstention gates to a classification pipeline, cutting acted-upon error rate without retraining the model.

ml conformal-prediction calibration uncertainty mapie selective-prediction

Oct 15, 2025 — 18 min — Platform & AI

Fine-Tuning Open Source LLMs With NVIDIA NeMo

A practical map of NVIDIA NeMo for teams that want to curate data, fine-tune open-source LLMs, evaluate them, and move from research checkpoints to production inference.

Outcome: Separated data curation, fine-tuning, alignment, evaluation, export, and serving concerns so open-source LLM customization could move from experiments to governed production workflows.

nemo llm fine-tuning mlops gpu training enterprise ai

Oct 11, 2025 — 16 min — Platform & AI

Plain-Language Machine Learning Metrics for Real Decisions

A practical explanation of ML metrics with decision tables for regression tolerance, rare-event classification, threshold tradeoffs, and the failure case where accuracy looked good but the decision failed.

Outcome: Clarified how metric choice, threshold design, tree-based pattern discovery, and logit interpretation affect whether ML outputs are useful for action.

machine learning model evaluation classification regression interpretability

Oct 7, 2025 — 7 min — Platform & AI

Probability Calibration Is an Operating Control

A practical playbook for turning classifier scores into reliable probabilities that can support ranking, thresholds, SLAs, and cost-sensitive decisions.

Outcome: Defined a calibration workflow that separates ranking from probability quality, uses scikit-learn calibration correctly, and carries thresholds and monitoring into production.

machine learning calibration mlops classification model evaluation

Oct 3, 2025 — 7 min — Platform & AI

The Three-Run Lab: How I Triage Slow PyTorch Training

A repeatable triage routine — the three-run baseline, DataLoader diagnosis, five profiler signatures, and a copy-paste scaffold — for finding where training time actually goes before touching the model.

Outcome: Identified and resolved training bottlenecks in under an hour by running the three-run baseline and reading profiler signatures before changing any model code.

pytorch ml training performance profiling debugging

Sep 29, 2025 — 7 min — Platform & AI

PyTorch Training Throughput: The Patterns That Actually Move the Number

torch.compile, mixed precision, gradient accumulation, DDP vs FSDP, and the profiler — the five levers I reach for before rethinking the model architecture.

Outcome: Cut training wall-clock time and GPU memory pressure by applying compile, AMP, and accumulation patterns in sequence before ever touching model architecture.

pytorch ml training performance gpu distributed-training

Sep 25, 2025 — 12 min — Platform & AI

A scikit-learn Pipeline for Calibrated Decisions

A production-friendly scikit-learn pattern for mixed tabular data, class imbalance, calibrated probabilities, threshold selection, and model persistence.

Outcome: Defined an end-to-end scikit-learn classification pipeline that keeps preprocessing, imbalance handling, probability calibration, evaluation, thresholding, and production artifacts aligned.

machine learning scikit-learn calibration classification mlops

Sep 21, 2025 — 14 min — Platform & AI

Algorithm Complexity as Engineering Judgment

A practical way to use algorithm complexity in product engineering, from choosing data structures to designing recommendation features that do not collapse as data grows.

Outcome: Explained how algorithm complexity shows up in everyday product work, then walked through an e-commerce recommendation feature from naive loops to indexed lookup.

software engineering algorithms systems design typescript performance