Training or Fine-tuning the Bloom AI Model on my own Dataset
Hello everyone ! I have a question to ask you, dear community.
How can i train the Bloom AI Model with my own training dataset ?
Is there any function in Bloom like "BloomSomeClass.train(inputs, outputs, params)" ?
Thank you for your answers in advance !
Hi!
You fine-tune BLOOM the same way you fine-tune any other model on HF.
Consider the official example for text classification: https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification
In the readme, you can find --model_name_or_path bert-base-multilingual-cased \
If you replace this line with --model_name_or_path bigscience/bloom-560m \
,
you will fine-tune the (smallest) bloom model on the dataset in question. If you are doing something other than text classification, please browse ../examples/pytorch to find what works for you. Beware that if you want to train the largest bloom (bigscience/bloom), you will need several hundred gigabytes of GPU memory.
If you want to do that in a modest setup, you can try https://github.com/bigscience-workshop/petals for distributed training.
Justheuristic, thank you very much for your answer ! I am doing the text generation for my project and i would like to train the model Bloom on my own dataset. In this case should i browse the link ../examples/pytorch you have kindly provided in order to find the necessary information about it ?
Thank you very much !