File size: 2,663 Bytes
21cf3e2 b4d596a 21cf3e2 b4d596a 21cf3e2 b4d596a 0bb2561 b4d596a 39e065e 9b9d1c4 b4d596a fc5adf3 e695cdc b4d596a 46c4eb9 21cf3e2 b4d596a 21cf3e2 b4d596a 21cf3e2 b4d596a 23620ba b4d596a 46c4eb9 b4d596a 9b9d1c4 b4d596a 21cf3e2 9b9d1c4 b4d596a 9b9d1c4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
base_model:
- akjindal53244/Llama-3.1-Storm-8B
- Sao10K/L3.1-8B-Niitama-v1.1
- v000000/L3.1-Niitorm-8B-t0.0001
library_name: transformers
tags:
- merge
- llama
- dpo
datasets:
- jondurbin/gutenberg-dpo-v0.1
---
# Llama-3.1-Niitorm-8B-DPO
* *DPO Trained, Llama3.1-8B.*
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/QeNjtwolNpxUmpo9NL7VI.png)
<b>New: DPO'd Gutenberg Version (full epoch training).</b>
RP model, Niitama 1.1 as a base, nearswapped with one of the smartest 3.1 models "Storm", mostly abliterated.
-------------------------------------------------------------------------------
*Gutenberg dataset creates more human writer-like prose and greately lessen synthetic feeling outputs.*
-------------------------------------------------------------------------------
## Finetune and merge
This is a merge and finetune of pre-trained language models.
*Resultant merge finetuned* on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) for 1 epoch, 1.5e-5 learning rate, on Nvidia A100.
## Merge Details
### Merge Method
This model was merged using the <b>NEARSWAP t0.0001</b> merge algorithm.
### Models Merged
The following models were included in the merge:
* Base Model: [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1) + [grimjim/Llama-3-Instruct-abliteration-LoRA-8B](https://huggingface.co/grimjim/Llama-3-Instruct-abliteration-LoRA-8B)
* [akjindal53244/Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
slices:
- sources:
- model: Sao10K/L3.1-8B-Niitama-v1.1+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
layer_range: [0, 32]
- model: akjindal53244/Llama-3.1-Storm-8B
layer_range: [0, 32]
merge_method: nearswap
base_model: Sao10K/L3.1-8B-Niitama-v1.1+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
parameters:
t:
- value: 0.0001
dtype: bfloat16
# Then, DPO Finetune
```
*Resultant merge finetuned* on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) for 1 epoch, 1.5e-5 learning rate, on Nvidia A100.
*I used a higher learning rate and full dataset compared to "L3.1-Celestial-Stone-2x8B-DPO".*
# Prompt Template:
```bash
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{output}<|eot_id|>
```
Credit to Alchemonaut.
Credit to jondurbin.
Credit to woofwolfy. |