Tag: mlops
13 entries tagged "mlops" — 13 posts, 0 links.
Posts
Why custom LLM logging leaves you flying blind in production, and how OpenTelemetry's GenAI semantic conventions turn every model call, tool invocation, and agent step into a traceable, cost-accountable span.
Outcome: Reader can instrument an LLM pipeline or agent workflow with OTEL GenAI conventions, export spans and cost metrics to any compatible backend, and build alerts on real token spend and latency instead of inferring from flat logs.
How a compact Python ML cheatsheet becomes useful when synthetic demos, metrics, pipelines, and version drift are tied to the model-review decisions they can actually defend.
Outcome: Reader can use minimal scikit-learn examples as smoke tests for task framing, metric choice, pipeline boundaries, and environment drift instead of treating them as production recipes.
How to turn DSPy and RAG evaluation into a production release loop with golden sets, retrieval checks, generation rubrics, regression thresholds, and versioned prompt programs.
Outcome: Promoted the note into an essay by defining a repeatable RAG evaluation workflow that separates retrieval quality from generation quality and blocks prompt-program regressions before release.
A practical ML terminology guide for model reviews where feature definitions, data splits, task type, optimization behavior, overfitting risk, regularization, ensembles, and embeddings need to be discussed precisely.
Outcome: Gave peers a review-ready vocabulary for inspecting ML systems by connecting core terms to design choices, failure modes, and release questions.
A production-friendly pattern for pairing scikit-learn preprocessing graphs with PyTorch models so training and inference use the same feature contract.
Outcome: Defined an artifact contract that keeps column preprocessing, feature order, model weights, metadata, and inference behavior synchronized across batch and serving environments.
Why tabular models drift between notebooks and production when preprocessing, sample metadata, hyperparameter search, and persistence are not treated as one scikit-learn pipeline contract.
Outcome: Defined a scikit-learn pipeline contract that keeps column preprocessing, metadata routing, hyperparameter search, evaluation, and deployment artifacts reproducible across dev, stage, and production.
A production-focused Vertex AI post on turning raw data, BigQuery features, online feature serving, model endpoints, monitoring, and retraining into one governed ML loop instead of another platform checklist.
Outcome: Defined a concrete Vertex AI feature-serving loop with source contracts, BigQuery feature views, point-in-time training exports, endpoint serving rules, monitoring thresholds, and retraining triggers.
A Vertex AI architecture map for teams that need to decide which Google Cloud AI services belong in the ML lifecycle, where ownership changes hands, and which older assumptions are now unsafe.
Outcome: Gave teams an operating contract for using Vertex AI across data, features, training, deployment, monitoring, and generative AI without confusing a product menu for a production ML system.
A practical AI strategy framework with a worked example that connects business levers, data readiness, pilots, evaluation, governance, deployment, and operating metrics.
Outcome: Defined an end-to-end AI strategy playbook and worked example that ties data readiness, use-case selection, model development, governance, deployment, and operating ownership to measurable business outcomes.
A practical runbook for scoring changed rows close to the data using Snowflake Streams and Tasks or BigQuery scheduled queries and remote models.
Outcome: Compared Snowflake and BigQuery patterns for scheduled in-warehouse inference, corrected CDC assumptions, and defined monitoring, grants, and deployment checks.
A practical map of NVIDIA NeMo for teams that want to curate data, fine-tune open-source LLMs, evaluate them, and move from research checkpoints to production inference.
Outcome: Separated data curation, fine-tuning, alignment, evaluation, export, and serving concerns so open-source LLM customization could move from experiments to governed production workflows.
A practical playbook for turning classifier scores into reliable probabilities that can support ranking, thresholds, SLAs, and cost-sensitive decisions.
Outcome: Defined a calibration workflow that separates ranking from probability quality, uses scikit-learn calibration correctly, and carries thresholds and monitoring into production.
A production-friendly scikit-learn pattern for mixed tabular data, class imbalance, calibrated probabilities, threshold selection, and model persistence.
Outcome: Defined an end-to-end scikit-learn classification pipeline that keeps preprocessing, imbalance handling, probability calibration, evaluation, thresholding, and production artifacts aligned.
All tags