Training steps
#3
by
malteos
- opened
Given that this is an intermediate checkpoint, from what training step is this checkpoint?
Yeah I think so, to go over the train sample you would need ~430.000 steps so yes it corresponds to 36% approximately
and according to https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/smaller_models/tr11f-6B3-ml.slurm the batch size is indeed 512
malteos
changed discussion status to
closed