microsoft/Phi-3.5-mini-instruct · tokenizer.model_max_length=2048 in sample

Hello and thx for the great work!

Based on my experience, this could be wrong.
The parameter max_seq_length=2048in SFTTrainer https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/64963004ad95869fa73a30279371c8778509ac84/sample_finetune.py#L189
already takes care of truncating longer examples during fine-tuning.

Also, when setting tokenizer.model_max_length=2048, I experienced strange errors that prevented the fine-tuning process.

LMK what you think...

microsoft
/

Phi-3.5-mini-instruct