Hugging Face,
gpt-osstransformersgpuai engineering
This is the runtime stack link for the gpt-oss moment: kernels from the Hub, MXFP4, tensor and expert parallelism, dynamic KV cache behavior, continuous batching, and faster loading.
Worth keeping because it connects model release excitement to the boring but decisive parts of deployment: memory, cache shape, batching, and what hardware the trick actually runs on.