bigscience/bloom · Training or Fine-tuning the Bloom AI Model on my own Dataset

Feb 10, 2023

•

edited Feb 10, 2023

Hello everyone ! I have a question to ask you, dear community.

How can i train the Bloom AI Model with my own training dataset ?
Is there any function in Bloom like "BloomSomeClass.train(inputs, outputs, params)" ?

Thank you for your answers in advance !

NicolasExo changed discussion title from Training the Bloom AI Model to Training or Fine-tuning the Bloom AI Model on my own Dataset Feb 10, 2023

justheuristic

BigScience Workshop org Feb 10, 2023

Hi!
You fine-tune BLOOM the same way you fine-tune any other model on HF.

Consider the official example for text classification: https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification

In the readme, you can find --model_name_or_path bert-base-multilingual-cased \

If you replace this line with --model_name_or_path bigscience/bloom-560m \,
you will fine-tune the (smallest) bloom model on the dataset in question. If you are doing something other than text classification, please browse ../examples/pytorch to find what works for you. Beware that if you want to train the largest bloom (bigscience/bloom), you will need several hundred gigabytes of GPU memory.

If you want to do that in a modest setup, you can try https://github.com/bigscience-workshop/petals for distributed training.

NicolasExo

Feb 10, 2023

•

edited Feb 10, 2023

Justheuristic, thank you very much for your answer ! I am doing the text generation for my project and i would like to train the model Bloom on my own dataset. In this case should i browse the link ../examples/pytorch you have kindly provided in order to find the necessary information about it ?

Thank you very much !