|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- trl |
|
- text-generation-inference |
|
- unsloth |
|
- mistral |
|
- gguf |
|
base_model: teknium/OpenHermes-2.5-Mistral-7B |
|
datasets: |
|
- sayhan/strix-philosophy-qa |
|
library_name: transformers |
|
--- |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65aa2d4b356bf23b4a4da247/nN4JZlIMeF-K2sFYfhLLT.png) |
|
# OpenHermes 2.5 Stix Philosophy Mistral 7B |
|
- **Finetuned by:** [sayhan](https://huggingface.co/sayhan) |
|
- **License:** [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) |
|
- **Finetuned from model :** [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) |
|
- **Dataset:** [sayhan/strix-philosophy-qa](https://huggingface.co/datasets/sayhan/strix-philosophy-qa) |
|
--- |
|
**LoRA rank:** 8 |
|
**LoRA alpha:** 16 |
|
**LoRA dropout:** 0 |
|
**Rank-stabilized LoRA:** Yes |
|
**Number of epochs:** 3 |
|
**Learning rate:** 1e-5 |
|
**Batch size:** 2 |
|
**Gradient accumulation steps:** 4 |
|
**Weight decay:** 0.01 |
|
**Target modules:** |
|
``` |
|
- Query projection (`q_proj`) |
|
- Key projection (`k_proj`) |
|
- Value projection (`v_proj`) |
|
- Output projection (`o_proj`) |
|
- Gate projection (`gate_proj`) |
|
- Up projection (`up_proj`) |
|
- Down projection (`down_proj`) |
|
``` |
|
|
|
|
|
|