noeloco commited on
Commit
fa33dfd
1 Parent(s): ea9482b

End of training

Browse files
Files changed (2) hide show
  1. README.md +158 -1
  2. adapter_model.bin +3 -0
README.md CHANGED
@@ -1,3 +1,160 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: llama2
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ base_model: codellama/CodeLlama-7b-hf
8
+ model-index:
9
+ - name: modeltest1
10
+ results: []
11
  ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.3.0`
20
+ ```yaml
21
+ base_model: codellama/CodeLlama-7b-hf
22
+ model_type: LlamaForCausalLM
23
+ tokenizer_type: CodeLlamaTokenizer
24
+ is_llama_derived_model: true
25
+
26
+ load_in_8bit: true
27
+ load_in_4bit: false
28
+ strict: false
29
+
30
+ datasets:
31
+ - path: /tmp/fizzbuzz-ft/
32
+ data_files: fizzbuzz-output.json
33
+ type: alpaca
34
+ ds_type: json
35
+
36
+ dataset_prepared_path: noeloco/cameltest
37
+ val_set_size: 0.05
38
+ output_dir: ./lora-out
39
+
40
+ hub_model_id: noeloco/modeltest1
41
+ hf_use_auth_token: true
42
+
43
+ sequence_len: 2048
44
+ sample_packing: true
45
+ eval_sample_packing: False
46
+ pad_to_sequence_len: true
47
+
48
+ adapter: lora
49
+ lora_model_dir:
50
+ lora_r: 8
51
+ lora_alpha: 16
52
+ lora_dropout: 0.05
53
+ lora_target_linear: true
54
+ lora_fan_in_fan_out:
55
+
56
+ wandb_project: runpod1
57
+ wandb_entity:
58
+ wandb_watch:
59
+ wandb_name:
60
+ wandb_log_model:
61
+
62
+ gradient_accumulation_steps: 4
63
+ micro_batch_size: 2
64
+ num_epochs: 1
65
+ optimizer: adamw_bnb_8bit
66
+ lr_scheduler: cosine
67
+ learning_rate: 0.0002
68
+
69
+ train_on_inputs: false
70
+ group_by_length: false
71
+ bf16: true
72
+ fp16: false
73
+ tf32: false
74
+
75
+ gradient_checkpointing: true
76
+ early_stopping_patience:
77
+ resume_from_checkpoint:
78
+ local_rank:
79
+ logging_steps: 1
80
+ xformers_attention:
81
+ flash_attention: true
82
+
83
+ warmup_steps: 10
84
+ evals_per_epoch: 4
85
+ saves_per_epoch: 1
86
+ debug: true
87
+ deepspeed:
88
+ weight_decay: 0.0
89
+ fsdp:
90
+ fsdp_config:
91
+ special_tokens:
92
+ bos_token: "<s>"
93
+ eos_token: "</s>"
94
+ unk_token: "<unk>"
95
+
96
+ ```
97
+
98
+ </details><br>
99
+
100
+ # modeltest1
101
+
102
+ This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
103
+ It achieves the following results on the evaluation set:
104
+ - Loss: 2.7345
105
+
106
+ ## Model description
107
+
108
+ More information needed
109
+
110
+ ## Intended uses & limitations
111
+
112
+ More information needed
113
+
114
+ ## Training and evaluation data
115
+
116
+ More information needed
117
+
118
+ ## Training procedure
119
+
120
+
121
+ The following `bitsandbytes` quantization config was used during training:
122
+ - quant_method: bitsandbytes
123
+ - load_in_8bit: True
124
+ - load_in_4bit: False
125
+ - llm_int8_threshold: 6.0
126
+ - llm_int8_skip_modules: None
127
+ - llm_int8_enable_fp32_cpu_offload: False
128
+ - llm_int8_has_fp16_weight: False
129
+ - bnb_4bit_quant_type: fp4
130
+ - bnb_4bit_use_double_quant: False
131
+ - bnb_4bit_compute_dtype: float32
132
+
133
+ ### Training hyperparameters
134
+
135
+ The following hyperparameters were used during training:
136
+ - learning_rate: 0.0002
137
+ - train_batch_size: 2
138
+ - eval_batch_size: 2
139
+ - seed: 42
140
+ - gradient_accumulation_steps: 4
141
+ - total_train_batch_size: 8
142
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
143
+ - lr_scheduler_type: cosine
144
+ - lr_scheduler_warmup_steps: 10
145
+ - num_epochs: 1
146
+
147
+ ### Training results
148
+
149
+ | Training Loss | Epoch | Step | Validation Loss |
150
+ |:-------------:|:-----:|:----:|:---------------:|
151
+ | 1.7285 | 1.0 | 1 | 2.7345 |
152
+
153
+
154
+ ### Framework versions
155
+
156
+ - PEFT 0.7.0
157
+ - Transformers 4.37.0.dev0
158
+ - Pytorch 2.0.1+cu118
159
+ - Datasets 2.16.1
160
+ - Tokenizers 0.15.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e21d98e65be8ce1f9ac99ceb19b3bd3aa73dd696cc7f7c22965f55c745f9b567
3
+ size 80114765