mcqa_quant_BBQ

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9790
Rewards/chosen: -0.9912
Rewards/rejected: -0.9047
Rewards/accuracies: 0.5
Rewards/margins: -0.0865
Logps/rejected: -26.9191
Logps/chosen: -28.3753
Logits/rejected: -3.3916
Logits/chosen: -3.3931

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 1
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6638	1.7241	50	0.7089	-0.0201	0.0054	0.3462	-0.0254	-17.8186	-18.6642	-3.5325	-3.5335
0.5696	3.4483	100	0.7434	-0.2832	-0.2559	0.4615	-0.0273	-20.4312	-21.2954	-3.4911	-3.4921
0.3751	5.1724	150	0.8633	-0.5219	-0.4777	0.4615	-0.0443	-22.6490	-23.6828	-3.4444	-3.4462
0.2126	6.8966	200	1.0177	-0.7687	-0.6209	0.3846	-0.1478	-24.0814	-26.1501	-3.4037	-3.4053
0.1764	8.6207	250	0.9790	-0.9912	-0.9047	0.5	-0.0865	-26.9191	-28.3753	-3.3916	-3.3931

Framework versions

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.1+cu118
Datasets 2.20.0
Tokenizers 0.19.1

chrisswillss98
/

dpo_mcqa_quantizedBitsAndBytes

mcqa_quant_BBQ

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chrisswillss98/dpo_mcqa_quantizedBitsAndBytes

Evaluation results