Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
RyanYr
/
reward-judge_iter-dpo-genRM_pilot-exp_iter1
like
0
Safetensors
llama
trl
dpo
Generated from Trainer
License:
llama3.1
Model card
Files
Files and versions
Community
Train
main
reward-judge_iter-dpo-genRM_pilot-exp_iter1
Commit History
Model save
3cf8b0d
verified
RyanYr
commited on
Sep 13
Training in progress, step 160, checkpoint
9c11f40
verified
RyanYr
commited on
Sep 13
Training in progress, step 160
aa42b1e
verified
RyanYr
commited on
Sep 13
Training in progress, step 150, checkpoint
c9b82ad
verified
RyanYr
commited on
Sep 13
Training in progress, step 150
6afa3ea
verified
RyanYr
commited on
Sep 13
initial commit
7dc9959
verified
RyanYr
commited on
Sep 13