adarshxs commited on
Commit
05c1d29
•
1 Parent(s): 6c52242

Upload updated model

Browse files
Files changed (3) hide show
  1. README.md +27 -8
  2. pytorch_model.bin +3 -0
  3. tokenizer.model +3 -0
README.md CHANGED
@@ -5,10 +5,11 @@ model-index:
5
  - name: out
6
  results: []
7
  ---
8
- ### This is the Instruction Fine Tuned version of [Tiny Llama](https://github.com/jzhang38/TinyLlama) on [@Teknium1's](https://twitter.com/Teknium1) [openhermes](https://huggingface.co/datasets/teknium/openhermes) dataset.
9
 
10
- `"The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01."`
 
11
 
 
12
  <details><summary>See axolotl config</summary>
13
 
14
  axolotl version: `0.3.0`
@@ -51,6 +52,8 @@ gradient_accumulation_steps: 2
51
  micro_batch_size: 8
52
  num_epochs: 1
53
  optimizer: adamw_bnb_8bit
 
 
54
  lr_scheduler: cosine
55
  learning_rate: 0.0002
56
 
@@ -86,9 +89,25 @@ special_tokens:
86
 
87
  </details><br>
88
 
 
89
 
90
- The loss for the 3T checkpoint explodes for some reason
91
- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/644bf6ef778ecbfb977e8e84/06bfkeS7cPoHxkeIHe5M7.jpeg)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  ### Training hyperparameters
94
 
@@ -102,7 +121,7 @@ The following hyperparameters were used during training:
102
  - gradient_accumulation_steps: 2
103
  - total_train_batch_size: 128
104
  - total_eval_batch_size: 64
105
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
106
  - lr_scheduler_type: cosine
107
  - lr_scheduler_warmup_steps: 100
108
  - num_epochs: 1
@@ -113,9 +132,9 @@ The following hyperparameters were used during training:
113
  | Training Loss | Epoch | Step | Validation Loss |
114
  |:-------------:|:-----:|:----:|:---------------:|
115
  | 3.0006 | 0.0 | 1 | 1.6838 |
116
- | 0.855 | 0.25 | 451 | 1.5228 |
117
- | 6.8636 | 0.5 | 902 | 7.4147 |
118
- | 6.9346 | 0.75 | 1353 | 7.4061 |
119
 
120
 
121
  ### Framework versions
 
5
  - name: out
6
  results: []
7
  ---
 
8
 
9
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
+ should probably proofread and complete it, then remove this comment. -->
11
 
12
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
13
  <details><summary>See axolotl config</summary>
14
 
15
  axolotl version: `0.3.0`
 
52
  micro_batch_size: 8
53
  num_epochs: 1
54
  optimizer: adamw_bnb_8bit
55
+ adam_epsilon: 0.00001
56
+ max_grad_norm: 1.0
57
  lr_scheduler: cosine
58
  learning_rate: 0.0002
59
 
 
89
 
90
  </details><br>
91
 
92
+ # out
93
 
94
+ This model was trained from scratch on the None dataset.
95
+ It achieves the following results on the evaluation set:
96
+ - Loss: 1.3647
97
+
98
+ ## Model description
99
+
100
+ More information needed
101
+
102
+ ## Intended uses & limitations
103
+
104
+ More information needed
105
+
106
+ ## Training and evaluation data
107
+
108
+ More information needed
109
+
110
+ ## Training procedure
111
 
112
  ### Training hyperparameters
113
 
 
121
  - gradient_accumulation_steps: 2
122
  - total_train_batch_size: 128
123
  - total_eval_batch_size: 64
124
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-05
125
  - lr_scheduler_type: cosine
126
  - lr_scheduler_warmup_steps: 100
127
  - num_epochs: 1
 
132
  | Training Loss | Epoch | Step | Validation Loss |
133
  |:-------------:|:-----:|:----:|:---------------:|
134
  | 3.0006 | 0.0 | 1 | 1.6838 |
135
+ | 0.8195 | 0.25 | 451 | 1.4620 |
136
+ | 0.6836 | 0.5 | 902 | 1.4158 |
137
+ | 0.6811 | 0.75 | 1353 | 1.3647 |
138
 
139
 
140
  ### Framework versions
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc780bb0faf671a3cb8409e7b4aab151cf6c760ad7ebe2748a189370924e3bfb
3
+ size 2200123773
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723