recommend some dataset

#5
by chadqiu - opened

some open-sourced dataset :
SkyPile-150B, Chinese dataset from Skywork-13B :https://huggingface.co/datasets/Skywork/SkyPile-150B
wanjuan, Chinese and English from InternLm: https://opendatalab.org.cn/OpenDataLab/WanJuan1_dot_0
Dolma, English 3T token dataset: https://huggingface.co/datasets/allenai/dolma

Sign up or log in to comment