Taras Sereda, Gimlet Blog,
pytorchapple silicongpulocal inference
This is a strong local-inference link because it shows the gap between framework support and hardware-specific performance work. Apple Silicon can be serious, but the path often goes through kernels, not just model.to("mps").
The AI-generated-kernel angle is interesting, but the bigger point is operational: when the backend is the bottleneck, model.to("mps") is only the start.