General disussion.

#1
by Lewdiculous - opened

General discussion and feedback.

Lewdiculous pinned discussion

After some more testing, the differences seem to be negligible.

The PPL is slightly lower with imatrix-with-rp-format-data.txt though so eh.

@Virt-io Interesting... From your perspetive do you say negligible compared to the original kalomaze groups_merged.txt data or to your previous v4 data with excerpts of the NSFW dataset?

(The formatting and stuff.)

There is not much difference in formatting from the original groups_merged.txt.

The slightly lower perplexity is most likely from just having more data.

Sorry for wasting your time.

There is not much difference in formatting from the original groups_merged.txt.

Well, crap, is it showing inconsistencies like you had?

The slightly lower perplexity is most likely from just having more data.

Could be that or some randomization to the process. There's no consensus yet unfortunately.

Sorry for wasting your time.

Hey, not at all! I am genuinely curious, haha.

If there were no significant changes then maybe just more data might - for better or worse?

The additions were pretty conservative in quantity.

Could be that or some randomization to the process. There's no consensus yet unfortunately.

Funny thing is I moved the rp data to the top of the file as an experiment and it made the PPL worse.

I think a better use of our time will be trying to make a multilingual imatrix.

Some models just have broken formatting, and that is something imatrix can't fix.

Sign up or log in to comment