Tokenizer padding token

#76
by Rish1 - opened

Attempting a finetune of llama3.1 8b instruct, but in config.json and tokenizer.json there seems to be no padding token assigned, which is odd to me. For batched tokenizer requests do I just pad with the EOS token?

Please share more context and error logs.

As I understood is there is no padding token is mentioned. if yes please try this.

Note: to this before Training

if not tokenizer.pad_token:
    tokenizer.pad_token = tokenizer.eos_token
    print(f"The tokenizer.pad_token set as a {tokenizer.eos_token}")

Sign up or log in to comment