starcoder uses Megatron-LM?
Software Orchestration: Megatron-LM
But from code like class GPTBigCodeBlock doesn't use megatron.
anything wrong?
@senxiangms checkout the pre-training code at https://github.com/bigcode-project/Megatron-LM.
ic. appreciated.
@senxiangms checkout the pre-training code at https://github.com/bigcode-project/Megatron-LM.
should I look at Megatron-LM/examples/pretrain_bigcode_model.slurm as entry point?
Thanks.
The pre-training was done in Megatron-LM and then we converted the checkpoints to transformers
which uses GPTBigCodeBlock ... If you're looking for the code to train the model in Megatron-LM it's there and the slurm script to launch the job is indeed Megatron-LM/examples/pretrain_bigcode_model.slurm
but it's specific to our cluster otherwise you can just use transformers
Thanks, @loubnabnl . well explained.
@loubnabnl , where can I find the original megatron-LM pretrained starcoder checkpoint instead of the converted one published in hugging face hub.
Thanks.
How to converted the checkpoints to transformers
?
How to converted the checkpoints to
transformers
?
Do you solve it?