Difference between this and the other (100 steps) model?
Im curious what the difference is between this model and the other one, only difference I see is in the name, the "100 steps".
The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.
The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.
Is this model, before or after those 100 steps
The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.
Is this model, before or after those 100 steps
After, refer to trainer_state.json
Which one we should use?
Which one we should use?
AALF/gemma-2-27b-it-SimPO-37K-100steps is better.