noeloco
/

modeltest1

@@ -16,20 +16,22 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 <details><summary>See axolotl config</summary>
-axolotl version: `0.3.0`
 ```yaml
 base_model: codellama/CodeLlama-7b-hf
 model_type: LlamaForCausalLM
 tokenizer_type: CodeLlamaTokenizer
 is_llama_derived_model: true
-load_in_8bit: true
-load_in_4bit: false
 strict: false
 datasets:
-  - path: /tmp/fizzbuzz-ft/
-    data_files: fizzbuzz-output.json
     type: alpaca
     ds_type: json
@@ -38,16 +40,16 @@ push_dataset_to_hub: noeloco
 val_set_size: 0.05
 output_dir: ./lora-out
 chat_template: chatml
-hub_model_id: noeloco/modeltest1
 sequence_len: 2048
 sample_packing: false
 pad_to_sequence_len: true
-adapter: lora
 lora_model_dir:
-lora_r: 8
-lora_alpha: 16
 lora_dropout: 0.05
 lora_target_linear: true
 lora_fan_in_fan_out:
@@ -58,10 +60,10 @@ wandb_watch:
 wandb_name:
 wandb_log_model:
-gradient_accumulation_steps: 4
 micro_batch_size: 2
-num_epochs: 2
-optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
@@ -77,7 +79,7 @@ resume_from_checkpoint:
 local_rank:
 logging_steps: 1
 xformers_attention:
-flash_attention: false
 warmup_steps: 10
 evals_per_epoch: 4
@@ -100,7 +102,7 @@ special_tokens:
 This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1202
 ## Model description
@@ -123,30 +125,33 @@ The following hyperparameters were used during training:
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 2
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.5644        | 0.06  | 1    | 2.7399          |
-| 1.575         | 0.29  | 5    | 2.6344          |
-| 1.1169        | 0.57  | 10   | 1.2350          |
-| 0.6719        | 0.86  | 15   | 0.5019          |
-| 0.3372        | 1.14  | 20   | 0.2525          |
-| 0.3403        | 1.43  | 25   | 0.1470          |
-| 0.1656        | 1.71  | 30   | 0.1202          |
 ### Framework versions
-- PEFT 0.7.2.dev0
-- Transformers 4.37.0
-- Pytorch 2.0.1+cu118
-- Datasets 2.16.1
 - Tokenizers 0.15.0

 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 <details><summary>See axolotl config</summary>
+axolotl version: `0.4.0`
 ```yaml
 base_model: codellama/CodeLlama-7b-hf
 model_type: LlamaForCausalLM
 tokenizer_type: CodeLlamaTokenizer
 is_llama_derived_model: true
+hub_model_id: noeloco/modeltest1
+load_in_8bit: false
+load_in_4bit: true
 strict: false
 datasets:
+  - path: /tmp/fizzbuzz-ft/datasets
+    data_files: /tmp/fizzbuzz-ft/datasets/training-set-alpaca.json
     type: alpaca
     ds_type: json
 val_set_size: 0.05
 output_dir: ./lora-out
 chat_template: chatml
 sequence_len: 2048
 sample_packing: false
 pad_to_sequence_len: true
+adapter: qlora
 lora_model_dir:
+lora_r: 16
+lora_alpha: 8
 lora_dropout: 0.05
 lora_target_linear: true
 lora_fan_in_fan_out:
 wandb_name:
 wandb_log_model:
+gradient_accumulation_steps: 1
 micro_batch_size: 2
+num_epochs: 3
+optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 local_rank:
 logging_steps: 1
 xformers_attention:
+flash_attention: true
 warmup_steps: 10
 evals_per_epoch: 4
 This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0295
 ## Model description
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 2.0177        | 0.01  | 1    | 2.5549          |
+| 0.603         | 0.26  | 18   | 0.8667          |
+| 0.3026        | 0.51  | 36   | 0.2340          |
+| 0.0977        | 0.77  | 54   | 0.1274          |
+| 0.1101        | 1.03  | 72   | 0.1098          |
+| 0.0503        | 1.29  | 90   | 0.0469          |
+| 0.0753        | 1.54  | 108  | 0.0516          |
+| 0.2285        | 1.8   | 126  | 0.0192          |
+| 0.0647        | 2.06  | 144  | 0.0386          |
+| 0.0494        | 2.31  | 162  | 0.0334          |
+| 0.0552        | 2.57  | 180  | 0.0293          |
+| 0.0888        | 2.83  | 198  | 0.0295          |
 ### Framework versions
+- PEFT 0.10.1.dev0
+- Transformers 4.40.0.dev0
+- Pytorch 2.1.2+cu118
+- Datasets 2.15.0
 - Tokenizers 0.15.0

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:54e31da3d13f2c089248c59cf1951dfe80272e8d85a2e8ce4cbefb2aaa759ee1
-size 80114765

 version https://git-lfs.github.com/spec/v1
+oid sha256:4ca1b3f41f48bd83f6939570330d3b5133250530794bf42ad3cc23a91023705b
+size 80115914