Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task	Version	Metric	Value		Stderr
hellaswag	0	acc	0.6621	±	0.0047
		acc_norm	0.8525	±	0.0035
arc_challenge	0	acc	0.6348	±	0.0141
		acc_norm	0.6698	±	0.0137
winogrande	0	acc	0.7861	±	0.0115
gsm8k	0	acc	0.5694	±	0.0136

Downloads last month: 732

Safetensors

Model size

7.24B params

Tensor type

BF16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for chargoddard/loyal-piano-m7-cdpo

Merges

3 models

Quantizations

1 model

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Spaces using chargoddard/loyal-piano-m7-cdpo 5

Collection including chargoddard/loyal-piano-m7-cdpo

Usable Models

Collection

5 items • Updated Dec 10, 2023 • 2