Difference to bigscience/bloom-350m
PLEASE DO NOT USE THIS MODEL! IT WILL BE REMOVED SOON. USE https://huggingface.co/bigscience/bloom-560m instead which is the same model.
Is it just the different naming because of a wrong number of parameters or is there any other difference? Hashes of model weights are at least identical.
Yes it's just naming; 350m is just kept for backwards compatibility & it will be removed soon
But it's still the outcome of this slurm job script, right? https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/smaller_models/tr11e-350M-ml.slurm
Alright but isn't the Slurm script producing a 350m model and not a 560m model?
It's the same model; the different names (parameter counts) correspond to whether or not you count embedding parameters.