tokenizer file change?

by SerialKicked - opened 12 days ago

12 days ago

I noticed you uploaded a different tokenizer.json after releasing the model. Would that impact the performances of any GGUF made before that change?

Good fine-tunes so far, by the way.

AuriAetherwiing

EVA-UNIT-01 org 12 days ago

it shouldn't hopefully, it was functional, but that tokenizer Axolotl generated was duped for some reason. As in - it had it's contents repeated twice, and hence it was twice the size.

SerialKicked

12 days ago

•

edited 12 days ago

Fair enough. Yeah, I doubt it had any impact. For what it's worth, I haven't noticed anything out of the ordinary while running the GGUF's so far.

Thanks for your answer.

Kearm

EVA-UNIT-01 org 12 days ago

Fair enough. Yeah, I doubt it had any impact. For what it's worth, I haven't noticed anything out of the ordinary while running the GGUF's so far.

Thanks for your answer.

Thankfully from what I know from llama.cpp the model's tokenizer isn't pulled from the model card but the a sort of "archive", thus early Llama 3.1 models had broken tokenizers as they used Smaug BPE. But this is a from memory thing so of course I may be wrong. Thanks for the feedback!

Kearm changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment