|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Intel/orca_dpo_pairs |
|
--- |
|
|
|
traversaal-2.5-Mistral-7B is trained via Direct Preference Optimization(DPO) from teknium/OpenHermes-2.5-Mistral-7B as its base model, with several optimizations in hyperparameters. |
|
teknium/OpenHermes-2.5-Mistral-7B is trained via Supervised Fine-Tuning (SFT) using LoRA, with the QWEN-72B model as its base-model. |
|
Note that we did not exploit any form of weight merge. |
|
For leaderboard submission, the trained weight is realigned for compatibility with Mistral-7b |
|
|
|
|
|
|