Tokenizer padding token

#76

by Rish1 - opened Aug 1

Aug 1

Attempting a finetune of llama3.1 8b instruct, but in config.json and tokenizer.json there seems to be no padding token assigned, which is odd to me. For batched tokenizer requests do I just pad with the EOS token?

antony-pk

Aug 2

Please share more context and error logs.

As I understood is there is no padding token is mentioned. if yes please try this.

Note: to this before Training

if not tokenizer.pad_token:
    tokenizer.pad_token = tokenizer.eos_token
    print(f"The tokenizer.pad_token set as a {tokenizer.eos_token}")

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment