Tag: transformers

4 entries tagged "transformers" — 1 post, 3 links.

Posts

Mar 12, 2026 — 9 min — Platform & AI

The Faster Transformers Stack Behind GPT-OSS

Why Hugging Face's faster Transformers work matters beyond GPT-OSS, and how kernels, MXFP4, parallelism, KV cache, batching, and model loading change practical LLM runtime decisions.

Outcome: Mapped the GPT-OSS-era Transformers runtime features into concrete decisions about memory, compute, cache behavior, batching, and serving boundaries.

transformers gpt-oss hugging face inference model performance

Links

Articlehuggingface.coApr 21, 2026Permalink

Tricks from OpenAI gpt-oss You Can Use with Transformers

Hugging Face

This is the runtime stack link for the gpt-oss moment: kernels from the Hub, MXFP4, tensor and expert parallelism, dynamic KV cache behavior, continuous batching, and faster loading.

Worth keeping because it connects model release excitement to the boring but decisive parts of deployment: memory, cache shape, batching, and what hardware the trick actually runs on.

Articlegregorygundersen.comApr 11, 2026Permalink

A History of Large Language Models

Gregory Gundersen

This is the kind of long historical synthesis that helps engineers stop treating transformers as a sudden miracle. It traces the ideas through distributed representations, language modeling, attention, transformers, and generative pretraining.

Worth keeping because architectural judgment improves when the current stack has a history. The tradeoffs feel less arbitrary when you can see what each generation was trying to solve.

llms ai history transformers ai engineering

Articlesakana.aiMar 29, 2026Permalink

An Evolved Universal Transformer Memory

Sakana AI

This is the primary source behind the memory-optimization link in the saved list. Sakana's Neural Attention Memory Models are interesting because they try to learn what a transformer should remember or forget rather than keeping every token equally alive.

Worth keeping, with caution. Memory savings are exciting, but production systems still need to ask what was discarded, when that is safe, and how failures show up in evaluation.

llm memory transformers ai research ai engineering

All tags