Edit model card

mcqa_quant_BBQ

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9790
  • Rewards/chosen: -0.9912
  • Rewards/rejected: -0.9047
  • Rewards/accuracies: 0.5
  • Rewards/margins: -0.0865
  • Logps/rejected: -26.9191
  • Logps/chosen: -28.3753
  • Logits/rejected: -3.3916
  • Logits/chosen: -3.3931

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6638 1.7241 50 0.7089 -0.0201 0.0054 0.3462 -0.0254 -17.8186 -18.6642 -3.5325 -3.5335
0.5696 3.4483 100 0.7434 -0.2832 -0.2559 0.4615 -0.0273 -20.4312 -21.2954 -3.4911 -3.4921
0.3751 5.1724 150 0.8633 -0.5219 -0.4777 0.4615 -0.0443 -22.6490 -23.6828 -3.4444 -3.4462
0.2126 6.8966 200 1.0177 -0.7687 -0.6209 0.3846 -0.1478 -24.0814 -26.1501 -3.4037 -3.4053
0.1764 8.6207 250 0.9790 -0.9912 -0.9047 0.5 -0.0865 -26.9191 -28.3753 -3.3916 -3.3931

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.1+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for chrisswillss98/dpo_mcqa_quantizedBitsAndBytes

Adapter
(509)
this model