---
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_L3_1000steps_1e6rate_01beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_L3_1000steps_1e6rate_01beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1802
- Rewards/chosen: -0.8216
- Rewards/rejected: -13.7782
- Rewards/accuracies: 0.7400
- Rewards/margins: 12.9566
- Logps/rejected: -213.4093
- Logps/chosen: -91.0134
- Logits/rejected: -0.8670
- Logits/chosen: -0.7142

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.1913        | 0.4   | 50   | 0.1803          | -0.5046        | -8.7772          | 0.7400             | 8.2726          | -163.3993      | -87.8437     | -0.8451         | -0.7284       |
| 0.1386        | 0.8   | 100  | 0.1802          | -1.0228        | -11.9098         | 0.7400             | 10.8870         | -194.7255      | -93.0261     | -0.8546         | -0.7152       |
| 0.1386        | 1.2   | 150  | 0.1802          | -0.6732        | -12.7363         | 0.7400             | 12.0631         | -202.9905      | -89.5298     | -0.8582         | -0.7093       |
| 0.1733        | 1.6   | 200  | 0.1802          | -0.6775        | -12.8705         | 0.7400             | 12.1930         | -204.3321      | -89.5723     | -0.8611         | -0.7114       |
| 0.2253        | 2.0   | 250  | 0.1802          | -0.7149        | -13.0474         | 0.7400             | 12.3326         | -206.1017      | -89.9464     | -0.8603         | -0.7104       |
| 0.1386        | 2.4   | 300  | 0.1802          | -0.7327        | -13.0995         | 0.7400             | 12.3668         | -206.6222      | -90.1248     | -0.8593         | -0.7091       |
| 0.1213        | 2.8   | 350  | 0.1802          | -0.7598        | -13.2905         | 0.7400             | 12.5307         | -208.5327      | -90.3961     | -0.8621         | -0.7116       |
| 0.1906        | 3.2   | 400  | 0.1802          | -0.7893        | -13.4540         | 0.7400             | 12.6647         | -210.1669      | -90.6907     | -0.8653         | -0.7135       |
| 0.1906        | 3.6   | 450  | 0.1802          | -0.7880        | -13.4497         | 0.7400             | 12.6617         | -210.1245      | -90.6778     | -0.8657         | -0.7141       |
| 0.2079        | 4.0   | 500  | 0.1802          | -0.8075        | -13.6024         | 0.7400             | 12.7949         | -211.6511      | -90.8724     | -0.8653         | -0.7127       |
| 0.156         | 4.4   | 550  | 0.1802          | -0.8042        | -13.6207         | 0.7400             | 12.8165         | -211.8345      | -90.8401     | -0.8658         | -0.7138       |
| 0.1213        | 4.8   | 600  | 0.1802          | -0.8154        | -13.6478         | 0.7400             | 12.8323         | -212.1049      | -90.9520     | -0.8661         | -0.7139       |
| 0.1906        | 5.2   | 650  | 0.1802          | -0.8263        | -13.7419         | 0.7400             | 12.9156         | -213.0464      | -91.0612     | -0.8667         | -0.7144       |
| 0.2426        | 5.6   | 700  | 0.1802          | -0.8316        | -13.7569         | 0.7400             | 12.9253         | -213.1964      | -91.1135     | -0.8668         | -0.7144       |
| 0.2599        | 6.0   | 750  | 0.1802          | -0.8155        | -13.7626         | 0.7400             | 12.9471         | -213.2537      | -90.9532     | -0.8669         | -0.7141       |
| 0.1213        | 6.4   | 800  | 0.1802          | -0.8348        | -13.7975         | 0.7400             | 12.9627         | -213.6019      | -91.1453     | -0.8666         | -0.7139       |
| 0.2426        | 6.8   | 850  | 0.1802          | -0.8359        | -13.7784         | 0.7400             | 12.9425         | -213.4111      | -91.1564     | -0.8664         | -0.7143       |
| 0.1733        | 7.2   | 900  | 0.1802          | -0.8274        | -13.7943         | 0.7400             | 12.9670         | -213.5706      | -91.0716     | -0.8673         | -0.7144       |
| 0.1386        | 7.6   | 950  | 0.1802          | -0.8173        | -13.7791         | 0.7400             | 12.9618         | -213.4180      | -90.9708     | -0.8670         | -0.7140       |
| 0.156         | 8.0   | 1000 | 0.1802          | -0.8216        | -13.7782         | 0.7400             | 12.9566         | -213.4093      | -91.0134     | -0.8670         | -0.7142       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1