Which scheduler is used during training?
After reading the paper, I have some questions about training. First, which scheduler is used during training? Then, is the timestep t during training randomized in a fixed list? For example, distillation of 128 steps to 32steps, only random t in [128, 256, 384, 512, ...] for teacher?
The exact formulation is provided in the paper. This correspond to Euler scheduler.
Timesteps are selected based on the Trailing method. The exact teacher steps:
128 steps: 999 991 983 976 968 960 952 944 936 929 921 913 905 897 890 882 874 866 858 851 843 835 827 819 812 804 796 788 780 772 765 757 749 741 733 726 718 710 702 694 686 679 671 663 655 647 640 632 624 616 608 601 593 585 577 569 562 554 546 538 530 522 515 507 499 491 483 476 468 460 452 444 436 429 421 413 405 397 390 382 374 366 358 351 343 335 327 319 312 304 296 288 280 272 265 257 249 241 233 226 218 210 202 194 186 179 171 163 155 147 140 132 124 116 108 101 93 85 77 69 62 54 46 38 30 22 15 7
32 steps: 999 968 936 905 874 843 812 780 749 718 686 655 624 593 562 530 499 468 436 405 374 343 312 280 249 218 186 155 124 93 62 30
8 steps: 999 874 749 624 499 374 249 124
4 steps: 999 749 499 249
2-step and 1-step models are trained on 4-step timesteps.
Thanks for the answer