satyamt's picture
Update README.md
14f5dfc verified
|
raw
history blame
1.67 kB
metadata
license: mit
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
  - jondurbin/truthy-dpo-v0.1
  - argilla/distilabel-math-preference-dpo
  - argilla/distilabel-capybara-dpo-7k-binarized
language:
  - en
library_name: adapter-transformers
base_model: Technoculture/MT7Bi-sft

Technoculture/MT7Bi-alpha-dpo-v-0.2

Training Details

  • GPU: Nvidia A100 Tensor Core GPU
  • Total Batches: 4266
  • Epochs: 3
  • Duration: 3 hours, 59 minutes, and 55 seconds

DPO Training Dataset Mixture

Dataset Name Original Size(Rows) Ratio Size After Ratio(Rows)
argilla/distilabel-math-preference-dpo 2.4k 1.0 2.4k
argilla/distilabel-intel-orca-dpo-pairs 12.9k 0.5 6.45k
jondurbin/truthy-dpo-v0.1 1.04k 1.0 1.04k
argilla/distilabel-capybara-dpo-7k-binarized 7.5k 0.2 1.5k
Total Size: 11.38k

Training Loss Plot

image/png

Training Loss Smoothed Plot

image/png

For full details of this dpo-training please go through our notebook.

Open In Colab