Difference between this and the other (100 steps) model?

by lemon07r - opened Aug 21

Aug 21

Im curious what the difference is between this model and the other one, only difference I see is in the name, the "100 steps".

AALF

Owner Aug 22

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

lemon07r

Aug 22

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

Is this model, before or after those 100 steps

AALF

Owner Aug 24

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

Is this model, before or after those 100 steps

After, refer to trainer_state.json

djuna

Sep 3

Which one we should use?

AALF

Owner Sep 3

Which one we should use?

AALF/gemma-2-27b-it-SimPO-37K-100steps is better.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment