Regarding the training data and replicability
#59
by
siarez
- opened
Are the checkpoint here from Google and trained with Google's data (which they never shared)? Or do the checkpoints actually come from training on the Wikipedia and BookCorpus that is publicly available on HuggingFace ?
In order words, will I be able to replicate this checkpoint by training on https://huggingface.co/datasets/wikipedia and https://huggingface.co/datasets/bookcorpus?
siarez
changed discussion title from
Regarding the training data
to Regarding the training data and replicability
@siarez yes you are right. This model is not fully replicable by https://huggingface.co/datasets/wikipedia and https://huggingface.co/datasets/bookcorpus which are not the datasets pre-processed by Google