noeloco commited on
Commit
f4819f9
1 Parent(s): dc99e17

End of training

Browse files
Files changed (2) hide show
  1. README.md +33 -28
  2. adapter_model.bin +2 -2
README.md CHANGED
@@ -16,20 +16,22 @@ should probably proofread and complete it, then remove this comment. -->
16
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
  <details><summary>See axolotl config</summary>
18
 
19
- axolotl version: `0.3.0`
20
  ```yaml
21
  base_model: codellama/CodeLlama-7b-hf
22
  model_type: LlamaForCausalLM
23
  tokenizer_type: CodeLlamaTokenizer
24
  is_llama_derived_model: true
25
 
26
- load_in_8bit: true
27
- load_in_4bit: false
 
 
28
  strict: false
29
 
30
  datasets:
31
- - path: /tmp/fizzbuzz-ft/
32
- data_files: fizzbuzz-output.json
33
  type: alpaca
34
  ds_type: json
35
 
@@ -38,16 +40,16 @@ push_dataset_to_hub: noeloco
38
  val_set_size: 0.05
39
  output_dir: ./lora-out
40
  chat_template: chatml
41
- hub_model_id: noeloco/modeltest1
42
 
43
  sequence_len: 2048
44
  sample_packing: false
45
  pad_to_sequence_len: true
46
 
47
- adapter: lora
48
  lora_model_dir:
49
- lora_r: 8
50
- lora_alpha: 16
51
  lora_dropout: 0.05
52
  lora_target_linear: true
53
  lora_fan_in_fan_out:
@@ -58,10 +60,10 @@ wandb_watch:
58
  wandb_name:
59
  wandb_log_model:
60
 
61
- gradient_accumulation_steps: 4
62
  micro_batch_size: 2
63
- num_epochs: 2
64
- optimizer: adamw_bnb_8bit
65
  lr_scheduler: cosine
66
  learning_rate: 0.0002
67
 
@@ -77,7 +79,7 @@ resume_from_checkpoint:
77
  local_rank:
78
  logging_steps: 1
79
  xformers_attention:
80
- flash_attention: false
81
 
82
  warmup_steps: 10
83
  evals_per_epoch: 4
@@ -100,7 +102,7 @@ special_tokens:
100
 
101
  This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
102
  It achieves the following results on the evaluation set:
103
- - Loss: 0.1202
104
 
105
  ## Model description
106
 
@@ -123,30 +125,33 @@ The following hyperparameters were used during training:
123
  - train_batch_size: 2
124
  - eval_batch_size: 2
125
  - seed: 42
126
- - gradient_accumulation_steps: 4
127
- - total_train_batch_size: 8
128
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
129
  - lr_scheduler_type: cosine
130
  - lr_scheduler_warmup_steps: 10
131
- - num_epochs: 2
132
 
133
  ### Training results
134
 
135
  | Training Loss | Epoch | Step | Validation Loss |
136
  |:-------------:|:-----:|:----:|:---------------:|
137
- | 1.5644 | 0.06 | 1 | 2.7399 |
138
- | 1.575 | 0.29 | 5 | 2.6344 |
139
- | 1.1169 | 0.57 | 10 | 1.2350 |
140
- | 0.6719 | 0.86 | 15 | 0.5019 |
141
- | 0.3372 | 1.14 | 20 | 0.2525 |
142
- | 0.3403 | 1.43 | 25 | 0.1470 |
143
- | 0.1656 | 1.71 | 30 | 0.1202 |
 
 
 
 
 
144
 
145
 
146
  ### Framework versions
147
 
148
- - PEFT 0.7.2.dev0
149
- - Transformers 4.37.0
150
- - Pytorch 2.0.1+cu118
151
- - Datasets 2.16.1
152
  - Tokenizers 0.15.0
 
16
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
  <details><summary>See axolotl config</summary>
18
 
19
+ axolotl version: `0.4.0`
20
  ```yaml
21
  base_model: codellama/CodeLlama-7b-hf
22
  model_type: LlamaForCausalLM
23
  tokenizer_type: CodeLlamaTokenizer
24
  is_llama_derived_model: true
25
 
26
+ hub_model_id: noeloco/modeltest1
27
+
28
+ load_in_8bit: false
29
+ load_in_4bit: true
30
  strict: false
31
 
32
  datasets:
33
+ - path: /tmp/fizzbuzz-ft/datasets
34
+ data_files: /tmp/fizzbuzz-ft/datasets/training-set-alpaca.json
35
  type: alpaca
36
  ds_type: json
37
 
 
40
  val_set_size: 0.05
41
  output_dir: ./lora-out
42
  chat_template: chatml
43
+
44
 
45
  sequence_len: 2048
46
  sample_packing: false
47
  pad_to_sequence_len: true
48
 
49
+ adapter: qlora
50
  lora_model_dir:
51
+ lora_r: 16
52
+ lora_alpha: 8
53
  lora_dropout: 0.05
54
  lora_target_linear: true
55
  lora_fan_in_fan_out:
 
60
  wandb_name:
61
  wandb_log_model:
62
 
63
+ gradient_accumulation_steps: 1
64
  micro_batch_size: 2
65
+ num_epochs: 3
66
+ optimizer: paged_adamw_32bit
67
  lr_scheduler: cosine
68
  learning_rate: 0.0002
69
 
 
79
  local_rank:
80
  logging_steps: 1
81
  xformers_attention:
82
+ flash_attention: true
83
 
84
  warmup_steps: 10
85
  evals_per_epoch: 4
 
102
 
103
  This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
+ - Loss: 0.0295
106
 
107
  ## Model description
108
 
 
125
  - train_batch_size: 2
126
  - eval_batch_size: 2
127
  - seed: 42
 
 
128
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
129
  - lr_scheduler_type: cosine
130
  - lr_scheduler_warmup_steps: 10
131
+ - num_epochs: 3
132
 
133
  ### Training results
134
 
135
  | Training Loss | Epoch | Step | Validation Loss |
136
  |:-------------:|:-----:|:----:|:---------------:|
137
+ | 2.0177 | 0.01 | 1 | 2.5549 |
138
+ | 0.603 | 0.26 | 18 | 0.8667 |
139
+ | 0.3026 | 0.51 | 36 | 0.2340 |
140
+ | 0.0977 | 0.77 | 54 | 0.1274 |
141
+ | 0.1101 | 1.03 | 72 | 0.1098 |
142
+ | 0.0503 | 1.29 | 90 | 0.0469 |
143
+ | 0.0753 | 1.54 | 108 | 0.0516 |
144
+ | 0.2285 | 1.8 | 126 | 0.0192 |
145
+ | 0.0647 | 2.06 | 144 | 0.0386 |
146
+ | 0.0494 | 2.31 | 162 | 0.0334 |
147
+ | 0.0552 | 2.57 | 180 | 0.0293 |
148
+ | 0.0888 | 2.83 | 198 | 0.0295 |
149
 
150
 
151
  ### Framework versions
152
 
153
+ - PEFT 0.10.1.dev0
154
+ - Transformers 4.40.0.dev0
155
+ - Pytorch 2.1.2+cu118
156
+ - Datasets 2.15.0
157
  - Tokenizers 0.15.0
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:54e31da3d13f2c089248c59cf1951dfe80272e8d85a2e8ce4cbefb2aaa759ee1
3
- size 80114765
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ca1b3f41f48bd83f6939570330d3b5133250530794bf42ad3cc23a91023705b
3
+ size 80115914