NVIDIA Technical Blog,
gpumodel trainingquantizationnvidia
This belongs in the queue because fine-tuning and inference decisions increasingly depend on numerical formats, not only model architecture. NVFP4 is a reminder that performance work moves through the whole stack: math representation, kernels, hardware, training stability, and serving economics.
The tradeoff is hardware specificity. A format can be a breakthrough and still be irrelevant to a Mac-local or non-NVIDIA deployment path.