About the training data and details
What is the composition and quantity of the training data, and will there be technical reports?
technical reports will be available in this week.
I will reply you after that.
I have paste some Evaluation result from technique report for preview only.
You can check our news report (in Chinese).
Check this arXiv preprint. ChemLLM: A Chemical Large Language Model
代码有没有在github上开源,我想了解更多技术细节
ChemLLM datasets is all open source now!
https://huggingface.co/papers/2402.06852
700K of SFT Dataset, ChemData700K For Chemistry of LLM!
https://huggingface.co/datasets/AI4Chem/ChemData700K
10K of DPO Dataset, ChemPref-10K, both English and Chinese!
https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-en
https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-cn
ChemBench-4K of 4100 high-quality single-choice benchmark for nine core Chemistry tasks!
https://huggingface.co/datasets/AI4Chem/ChemBench4K
C-MHChem, 600 real test questions written and checked manually, from 25 years of Chinese National Middle school chemistry Test!
https://huggingface.co/datasets/AI4Chem/C-MHChem-Benchmark-Chinese-Middle-high-school-Chemistry-Test
All hail to Open-source community!🤗
代码有没有在github上开源,我想了解更多技术细节
ChemLLM datasets is all open source now!
https://huggingface.co/papers/2402.06852
700K of SFT Dataset, ChemData700K For Chemistry of LLM!
https://huggingface.co/datasets/AI4Chem/ChemData700K
10K of DPO Dataset, ChemPref-10K, both English and Chinese!
https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-en
https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-cn
ChemBench-4K of 4100 high-quality single-choice benchmark for nine core Chemistry tasks!
https://huggingface.co/datasets/AI4Chem/ChemBench4K
C-MHChem, 600 real test questions written and checked manually, from 25 years of Chinese National Middle school chemistry Test!
https://huggingface.co/datasets/AI4Chem/C-MHChem-Benchmark-Chinese-Middle-high-school-Chemistry-Test
All hail to Open-source community!🤗