---
library_name: transformers
license: apache-2.0
base_model: tsavage68/IE_M2_1000steps_1e7rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_M2_1000steps_1e7rate_01beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_M2_1000steps_1e7rate_01beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/IE_M2_1000steps_1e7rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3743
- Rewards/chosen: -0.3291
- Rewards/rejected: -6.1017
- Rewards/accuracies: 0.4600
- Rewards/margins: 5.7727
- Logps/rejected: -102.0393
- Logps/chosen: -45.4965
- Logits/rejected: -2.8684
- Logits/chosen: -2.8050

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.558         | 0.4   | 50   | 0.4553          | -0.0349        | -0.8002          | 0.4600             | 0.7653          | -49.0237       | -42.5545     | -2.9038         | -2.8422       |
| 0.3818        | 0.8   | 100  | 0.3747          | -0.1730        | -3.5887          | 0.4600             | 3.4157          | -76.9091       | -43.9359     | -2.8759         | -2.8145       |
| 0.3123        | 1.2   | 150  | 0.3744          | -0.2403        | -4.3676          | 0.4600             | 4.1273          | -84.6980       | -44.6088     | -2.8742         | -2.8132       |
| 0.364         | 1.6   | 200  | 0.3744          | -0.2016        | -4.5800          | 0.4600             | 4.3784          | -86.8216       | -44.2215     | -2.8745         | -2.8130       |
| 0.4332        | 2.0   | 250  | 0.3743          | -0.2684        | -4.8731          | 0.4600             | 4.6046          | -89.7525       | -44.8898     | -2.8737         | -2.8118       |
| 0.3986        | 2.4   | 300  | 0.3743          | -0.1931        | -5.0362          | 0.4600             | 4.8430          | -91.3835       | -44.1367     | -2.8747         | -2.8125       |
| 0.3986        | 2.8   | 350  | 0.3743          | -0.1846        | -5.1505          | 0.4600             | 4.9659          | -92.5268       | -44.0517     | -2.8745         | -2.8120       |
| 0.4506        | 3.2   | 400  | 0.3743          | -0.1881        | -5.2928          | 0.4600             | 5.1047          | -93.9497       | -44.0868     | -2.8736         | -2.8107       |
| 0.4505        | 3.6   | 450  | 0.3743          | -0.2250        | -5.5587          | 0.4600             | 5.3337          | -96.6092       | -44.4557     | -2.8724         | -2.8094       |
| 0.4332        | 4.0   | 500  | 0.3743          | -0.4284        | -5.9879          | 0.4600             | 5.5595          | -100.9007      | -46.4892     | -2.8698         | -2.8066       |
| 0.3292        | 4.4   | 550  | 0.3743          | -0.3669        | -5.9892          | 0.4600             | 5.6223          | -100.9135      | -45.8741     | -2.8695         | -2.8063       |
| 0.3639        | 4.8   | 600  | 0.3743          | -0.2855        | -5.9594          | 0.4600             | 5.6739          | -100.6163      | -45.0607     | -2.8699         | -2.8066       |
| 0.4505        | 5.2   | 650  | 0.3743          | -0.3591        | -6.0896          | 0.4600             | 5.7305          | -101.9183      | -45.7970     | -2.8685         | -2.8052       |
| 0.4505        | 5.6   | 700  | 0.3743          | -0.3292        | -6.0868          | 0.4600             | 5.7576          | -101.8900      | -45.4977     | -2.8687         | -2.8054       |
| 0.3639        | 6.0   | 750  | 0.3743          | -0.3284        | -6.1008          | 0.4600             | 5.7724          | -102.0299      | -45.4898     | -2.8683         | -2.8049       |
| 0.2426        | 6.4   | 800  | 0.3743          | -0.3283        | -6.0983          | 0.4600             | 5.7700          | -102.0044      | -45.4881     | -2.8684         | -2.8051       |
| 0.5025        | 6.8   | 850  | 0.3743          | -0.3251        | -6.0987          | 0.4600             | 5.7737          | -102.0092      | -45.4562     | -2.8685         | -2.8051       |
| 0.3119        | 7.2   | 900  | 0.3743          | -0.3297        | -6.1009          | 0.4600             | 5.7712          | -102.0308      | -45.5028     | -2.8684         | -2.8050       |
| 0.3466        | 7.6   | 950  | 0.3743          | -0.3291        | -6.1017          | 0.4600             | 5.7727          | -102.0393      | -45.4965     | -2.8684         | -2.8050       |
| 0.3812        | 8.0   | 1000 | 0.3743          | -0.3291        | -6.1017          | 0.4600             | 5.7727          | -102.0393      | -45.4965     | -2.8684         | -2.8050       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1