Tensoic
/

TinyLlama-1.1B-3T-openhermes

@@ -5,10 +5,11 @@ model-index:
 - name: out
   results: []
 ---
-### This is the Instruction Fine Tuned version of [Tiny Llama](https://github.com/jzhang38/TinyLlama) on [@Teknium1's](https://twitter.com/Teknium1) [openhermes](https://huggingface.co/datasets/teknium/openhermes) dataset.
-`"The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01."`
 <details><summary>See axolotl config</summary>
 axolotl version: `0.3.0`
@@ -51,6 +52,8 @@ gradient_accumulation_steps: 2
 micro_batch_size: 8
 num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
@@ -86,9 +89,25 @@ special_tokens:
 </details><br>
-The loss for the 3T checkpoint explodes for some reason
-![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/644bf6ef778ecbfb977e8e84/06bfkeS7cPoHxkeIHe5M7.jpeg)
 ### Training hyperparameters
@@ -102,7 +121,7 @@ The following hyperparameters were used during training:
 - gradient_accumulation_steps: 2
 - total_train_batch_size: 128
 - total_eval_batch_size: 64
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
 - num_epochs: 1
@@ -113,9 +132,9 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
 | 3.0006        | 0.0   | 1    | 1.6838          |
-| 0.855         | 0.25  | 451  | 1.5228          |
-| 6.8636        | 0.5   | 902  | 7.4147          |
-| 6.9346        | 0.75  | 1353 | 7.4061          |
 ### Framework versions

 - name: out
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 <details><summary>See axolotl config</summary>
 axolotl version: `0.3.0`
 micro_batch_size: 8
 num_epochs: 1
 optimizer: adamw_bnb_8bit
+adam_epsilon: 0.00001
+max_grad_norm: 1.0
 lr_scheduler: cosine
 learning_rate: 0.0002
 </details><br>
+# out
+This model was trained from scratch on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.3647
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
 ### Training hyperparameters
 - gradient_accumulation_steps: 2
 - total_train_batch_size: 128
 - total_eval_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-05
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
 - num_epochs: 1
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
 | 3.0006        | 0.0   | 1    | 1.6838          |
+| 0.8195        | 0.25  | 451  | 1.4620          |
+| 0.6836        | 0.5   | 902  | 1.4158          |
+| 0.6811        | 0.75  | 1353 | 1.3647          |
 ### Framework versions

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc780bb0faf671a3cb8409e7b4aab151cf6c760ad7ebe2748a189370924e3bfb
+size 2200123773

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723