Trained From Scratch, MNIST https://huggingface.co/datasets/mnist
ViT_Small: {"chw": (1, 28, 28), "n_patches": 7, "n_blocks": 4, "hidden_d": 8, "n_heads": 4, "out_d": 10} 23 kB 2K+ Steps
ViT_Large: {"chw": (1, 28, 28), "n_patches": 7, "n_blocks": 6, "hidden_d": 64, "n_heads": 8, "out_d": 10} 881 kB 20K+ Steps