Gregory Gundersen,
llmsai historytransformersai engineering
This is the kind of long historical synthesis that helps engineers stop treating transformers as a sudden miracle. It traces the ideas through distributed representations, language modeling, attention, transformers, and generative pretraining.
Worth keeping because architectural judgment improves when the current stack has a history. The tradeoffs feel less arbitrary when you can see what each generation was trying to solve.