--- license: other library_name: peft tags: - llama-factory - lora - generated_from_trainer base_model: - jiazhengli/Mixtral-8x7B-Instruct-v0.1-QLoRA-Assessment-Rationale-sft - mistralai/Mixtral-8x7B-Instruct-v0.1 model-index: - name: sft_trained_woaqa_mixtral_dpo results: [] datasets: - jiazhengli/Rationale_MCTS - jiazhengli/Synthetic_Rationale language: - en metrics: - accuracy - f1 --- # Mixtral-8x7B-Instruct-v0.1-QLoRA-Assessment-Rationale-dpo The model trained with w/o private data from the EMNLP 2024 Paper: Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring. - **Paper:** [Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring](https://arxiv.org/abs/2406.19949) (EMNLP 2024 Findings) - **GitHub Repository:** [Thought Tree Assessment Repository](https://github.com/lijiazheng99/thought_tree_assessment) ## Intended uses & limitations This model offers a valuable resource for research in explainable AI within educational technology. The model is trained with **noisy** response-level rationales. This makes them **unsuitable** for direct application in high-stakes assessments without additional verification. ## Training and evaluation data We trained and evaluated the model on the [Synthetic Rationale data](https://huggingface.co/datasets/jiazhengli/Synthetic_Rationale), which was generated from the [Rationale MCTS data](https://huggingface.co/datasets/jiazhengli/Rationale_MCTS). To extract scores from rationales, please use the [jiazhengli/deberta-v3-large-Rationale-to-Score](https://huggingface.co/jiazhengli/deberta-v3-large-Rationale-to-Score). ## Citation Please cite the following work if you utilize this model: **BibTeX:** ```bibtex @misc{li2024calibratingllmspreferenceoptimization, title={Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring}, author={Jiazheng Li and Hainiu Xu and Zhaoyue Sun and Yuxiang Zhou and David West and Cesare Aloisi and Yulan He}, year={2024}, eprint={2406.19949}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.19949}, } ``` ## Training procedure Please refer to our [paper](https://arxiv.org/abs/2406.19949). ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 0.1 - num_epochs: 3.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 1.1726 | 0.33 | 200 | 2.6079 | 14.8760 | 13.6948 | 0.5929 | 1.1812 | -199.3195 | -162.4827 | -0.7607 | -0.7827 | | 1.0028 | 0.67 | 400 | 2.6743 | 14.9730 | 13.8342 | 0.5844 | 1.1388 | -197.9255 | -161.5126 | -0.8669 | -0.8743 | | 0.5127 | 1.0 | 600 | 2.5239 | 15.4063 | 13.9931 | 0.6000 | 1.4132 | -196.3373 | -157.1801 | -0.8501 | -0.8561 | | 0.3787 | 1.33 | 800 | 2.5951 | 15.2695 | 13.9112 | 0.6142 | 1.3582 | -197.1555 | -158.5480 | -0.8385 | -0.8358 | | 0.381 | 1.67 | 1000 | 2.5814 | 15.0186 | 13.4813 | 0.6213 | 1.5373 | -201.4548 | -161.0572 | -0.7846 | -0.7808 | | 0.2993 | 2.0 | 1200 | 2.5816 | 15.0307 | 13.3917 | 0.6383 | 1.6390 | -202.3505 | -160.9355 | -0.7590 | -0.7554 | | 0.2917 | 2.33 | 1400 | 2.6270 | 14.5370 | 12.8885 | 0.6426 | 1.6485 | -207.3829 | -165.8732 | -0.8337 | -0.8292 | | 0.2881 | 2.67 | 1600 | 2.6358 | 14.3849 | 12.6973 | 0.6468 | 1.6875 | -209.2946 | -167.3941 | -0.8503 | -0.8468 | ### Framework versions - PEFT 0.10.0 - Transformers 4.38.2 - Pytorch 2.2.1+cu121 - Datasets 2.18.0 - Tokenizers 0.15.2