meta-llama/Meta-Llama-3-8B · HF transformers fine-tune code hangs with Llama3 ?

Apr 22

we used our existing fine-tune code, which worked with llama1 and llama2 base models

    trainer = Trainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args,
        **data_module,
        callbacks=[ManifoldTensorBoardLoggerCallback()],
    )
    trainer.train()

but once the trainer starts fine-tuning from a llama3-8B, it barely makes any progress ("only prints the 0% on the progress status once, and then never updates it) after 5 hours. previously with llama2-7B, it runs through 40% of our examples within 25 minutes

Ateeqq

Apr 23

Yes, I am also experiencing this issue.

mph

Apr 23

•

edited Apr 29

I'm able to fine-tune Llama3 using Accelerate and DeepSpeed ZeRO-2. However, the resulting model doesn't know how to stop generating properly. It spews garbage after answering my question—until max_new_tokens is reached....just like Phi-2. The same training script works flawlessly with Phi-3 and Mistral-7B, though.