jdpressman
commited on
Commit
•
8d2472d
1
Parent(s):
fc73b33
Add training details
Browse files
README.md
CHANGED
@@ -98,6 +98,17 @@ autoregressive language models and be useful to alignment and interpretability r
|
|
98 |
|
99 |
## Training procedure
|
100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
|
102 |
The following `bitsandbytes` quantization config was used during training:
|
103 |
- quant_method: bitsandbytes
|
|
|
98 |
|
99 |
## Training procedure
|
100 |
|
101 |
+
This model was trained on [a 1 billion token sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample) of RedPajama
|
102 |
+
on 8x H100 GPUs for roughly 24 hours.
|
103 |
+
|
104 |
+
Using the scripts in the MiniHF repo as they exist now the training commands were:
|
105 |
+
|
106 |
+
accelerate launch train_vae_overlap.py --model "mistralai/Mistral-7B-v0.1"
|
107 |
+
--preprocessed preprocessed_mistral --context 64 --output vae_64_overlap_mistral --batch-size 24
|
108 |
+
|
109 |
+
accelerate launch train_vae_router.py --model "mistralai/Mistral-7B-v0.1"
|
110 |
+
--preprocessed preprocessed_mistral --vae-context 64 --start-from vae_64_overlap_mistral
|
111 |
+
--output vae_64_overlap_router_mistral --lr 1e-4 --batch-size 1
|
112 |
|
113 |
The following `bitsandbytes` quantization config was used during training:
|
114 |
- quant_method: bitsandbytes
|