File size: 2,665 Bytes
25668e5
121e91c
 
25668e5
121e91c
 
25668e5
 
121e91c
25668e5
121e91c
25668e5
 
121e91c
25668e5
121e91c
25668e5
 
121e91c
25668e5
121e91c
25668e5
121e91c
25668e5
121e91c
25668e5
121e91c
25668e5
121e91c
 
 
25668e5
 
121e91c
 
 
 
 
 
 
25668e5
 
 
 
 
121e91c
 
25668e5
121e91c
25668e5
121e91c
 
 
 
 
25668e5
 
 
 
 
121e91c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: mit
library_name: "trl"
tags:
- DPO
- DPO
base_model: Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT
model-index:
- name: Weni/WeniGPT-2.7.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
  results: []
language: ['pt']
---

# Weni/WeniGPT-2.7.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation

This model is a fine-tuned version of [Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT] on the dataset Weni/LLM_Base_2.0.3_DPO with the DPO trainer. It is part of the DPO project for [Weni](https://weni.ai/).

It achieves the following results on the evaluation set:
{'eval_loss': 0.6931472420692444, 'eval_runtime': 134.8325, 'eval_samples_per_second': 3.642, 'eval_steps_per_second': 0.912, 'eval_rewards/chosen': 0.0, 'eval_rewards/rejected': 0.0, 'eval_rewards/accuracies': 0.0, 'eval_rewards/margins': 0.0, 'eval_logps/rejected': -202.83580017089844, 'eval_logps/chosen': -64.19176483154297, 'eval_logits/rejected': -1.9818029403686523, 'eval_logits/chosen': -1.6430171728134155, 'epoch': 0.01}

## Intended uses & limitations

This model has not been trained to avoid specific intructions. 

## Training procedure

Finetuning was done on the model Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT with the following prompt:

```
Question:
<|user|>{question}</s>


Chosen:
<|assistant|>{correct_ans}</s>


Rejected:
<|assistant|>{rejected_ans}</s>
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 4
- gradient_accumulation_steps: 4
- num_gpus: 1
- total_train_batch_size: 32
- optimizer: AdamW
- lr_scheduler_type: cosine
- num_steps: 1
- quantization_type: bitsandbytes
- LoRA: ("\n  - bits: 4\n  - use_exllama: True\n  - device_map: auto\n  - use_cache: False\n  - lora_r: 8\n  - lora_alpha: 16\n  - lora_dropout: 0.1\n  - bias: none\n  - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n  - task_type: CAUSAL_LM",)

### Training results

### Framework versions

- git+https://github.com/huggingface/transformers@main
- datasets==2.17.1
- peft==0.8.2
- safetensors==0.4.2
- evaluate==0.4.1
- bitsandbytes==0.42
- huggingface_hub==0.20.3
- seqeval==1.2.2
- optimum==1.17.1
- auto-gptq==0.7.0
- gpustat==1.1.1
- deepspeed==0.13.2
- wandb==0.16.3
- git+https://github.com/huggingface/trl.git@main
- git+https://github.com/huggingface/accelerate.git@main
- coloredlogs==15.0.1
- traitlets==5.14.1
- autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl

### Hardware
- Cloud provided: runpod.io