HF transformers fine-tune code hangs with Llama3 ?
#48
by
teddyyyy123
- opened
we used our existing fine-tune code, which worked with llama1 and llama2 base models
trainer = Trainer(
model=model,
tokenizer=tokenizer,
args=training_args,
**data_module,
callbacks=[ManifoldTensorBoardLoggerCallback()],
)
trainer.train()
but once the trainer starts fine-tuning from a llama3-8B, it barely makes any progress ("only prints the 0% on the progress status once, and then never updates it) after 5 hours. previously with llama2-7B, it runs through 40% of our examples within 25 minutes
Yes, I am also experiencing this issue.
I'm able to fine-tune Llama3 using Accelerate and DeepSpeed ZeRO-2. However, the resulting model doesn't know how to stop generating properly. It spews garbage after answering my question—until max_new_tokens is reached....just like Phi-2. The same training script works flawlessly with Phi-3 and Mistral-7B, though.