Beingjoy commited on
Commit
06ef4bd
•
1 Parent(s): 2c5c1c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md CHANGED
@@ -1,3 +1,108 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
  ---
7
+ <div align="center"><img src="https://github.com/BudEcosystem/boomer/blob/main/assets/boomer-logo.png" width=200></div>
8
+
9
+
10
+ <p align="center"><i>Democratizing access to LLMs for the open-source community.<br>Let's advance AI, together. </i></p>
11
+
12
+ ----
13
+
14
+ ## Introduction 🎉
15
+
16
+ We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b)
17
+
18
+ <div align="center"><img src="https://github.com/BudEcosystem/boomer/blob/main/assets/boomer-arch.jpg" width=500></div>
19
+
20
+ ## Getting Started on GitHub 💻
21
+
22
+ Ready to dive in? Here's how you can get started with our models on GitHub.
23
+
24
+ Install the necessary dependencies with the following command:
25
+
26
+ ```bash
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ ### Generate responses
31
+
32
+ Now that your model is fine-tuned, you're ready to generate responses. You can do this using our generate.py script, which runs inference from the Hugging Face model hub and inference on a specified input. Here's an example of usage:
33
+
34
+ ```bash
35
+ python generate.py --base_model 'budecosystem/boomer-1b' --prompt="the president of India is"
36
+ ```
37
+
38
+ ### Fine-tuning 🎯
39
+
40
+
41
+ It's time to upgrade the model by fine-tuning the model. You can do this using our provided finetune.py script. Here's an example command:
42
+
43
+ ```bash
44
+ torchrun --nproc_per_node 4 train.py \
45
+ --base_model budecosystem/boomer-1b \
46
+ --data_path dataset.json \
47
+ --output_dir output \
48
+ --per_device_train_batch_size 2 \
49
+ --gradient_accumulation_steps 2 \
50
+ --num_train_epochs 1 \
51
+ --learning_rate 2e-5 \
52
+ --fp16 True \
53
+ --logging_steps 10 \
54
+ --deepspeed ds_config.json
55
+ ```
56
+
57
+ ## Model details
58
+
59
+ | Parameters | Value |
60
+ | :------------- | :----: |
61
+ | n_layers | 4 |
62
+ | n_heads | 32 |
63
+ | d_model | 4096 |
64
+ | vocab size | 32000 |
65
+ | sequence length | 4096 |
66
+ | Intermediate size | 11008 |
67
+
68
+ ### Tokenizer
69
+
70
+ We used the SentencePiece tokenizer during the fine-tuning process. This tokenizer is known for its capability to handle open-vocabulary language tasks efficiently.
71
+
72
+ ### Training details
73
+
74
+ The model is trained of 4 A100 80GB for approximately 250hrs.
75
+
76
+ | Hyperparameters | Value |
77
+ | :----------------------------| :-----: |
78
+ | per_device_train_batch_size | 2 |
79
+ | gradient_accumulation_steps | 2 |
80
+ | learning_rate | 2e-4 |
81
+ | optimizer | adamw |
82
+ | beta | 0.9, 0.95 |
83
+ | fp16 | True |
84
+ | GPU | 4 A100 80GB |
85
+
86
+
87
+ ## Evaluations
88
+
89
+ We have evaluated the pre-trained model on few of the benchmarks
90
+
91
+ | Model Name | ARC | MMLU | Human Eval | Hellaswag | BBH | DROP | GSM8K |
92
+ |:----------:|:--------:|:----:|:----------:|:---------:|:-----: |:-----:|:----:|
93
+ | Boomer1B | 22.35 | 25.92| 6.1 | 31.66 | 28.65 | 6.13 | 1.5 |
94
+
95
+ ### Why use BOOMER?
96
+
97
+ Retrieval augmentation
98
+ Inference at the edge
99
+ Language modeling use cases
100
+
101
+ ### Final thought on Boomer!
102
+
103
+ This isn't the end. It's just the beginning of a journey towards creating more advanced, more efficient, and more accessible language models. We invite you to join us on this exciting journey.
104
+
105
+
106
+ ### Aknowledgements
107
+
108
+ We'd like to thank the open-source community and the researchers whose foundational work laid the path for BOOMER. Special shoutout to our dedicated team who have worked relentlessly to curate the dataset and fine-tune the model to perfection.