TristanBehrens commited on
Commit
93a3e48
1 Parent(s): c11074a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ library_name: peft
4
+ tags:
5
+ - generated_from_trainer
6
+ base_model: codellama/CodeLlama-7b-hf
7
+ model-index:
8
+ - name: out/test
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.4.0`
19
+ ```yaml
20
+ base_model: codellama/CodeLlama-7b-hf
21
+ model_type: LlamaForCausalLM
22
+ tokenizer_type: CodeLlamaTokenizer
23
+
24
+ load_in_8bit: true
25
+ load_in_4bit: false
26
+ strict: false
27
+
28
+ datasets:
29
+ - path: TristanBehrens/MusicCode_JSFakes_2024_Compose
30
+ type:
31
+ system_prompt: ""
32
+ system_format: ""
33
+ format: "[INST] {instruction} [/INST]"
34
+ no_input_format: "[INST] {instruction} [/INST]"
35
+ dataset_prepared_path:
36
+ val_set_size: 0.05
37
+ output_dir: ./out/test
38
+
39
+ sequence_len: 16384
40
+ sample_packing: true
41
+ pad_to_sequence_len: true
42
+
43
+ adapter: lora
44
+ lora_model_dir:
45
+ lora_r: 32
46
+ lora_alpha: 16
47
+ lora_dropout: 0.05
48
+ lora_target_linear: true
49
+ lora_fan_in_fan_out:
50
+
51
+ wandb_project:
52
+ wandb_entity:
53
+ wandb_watch:
54
+ wandb_name:
55
+ wandb_log_model:
56
+
57
+ gradient_accumulation_steps: 4
58
+ micro_batch_size: 4
59
+ num_epochs: 4
60
+ optimizer: adamw_bnb_8bit
61
+ lr_scheduler: cosine
62
+ learning_rate: 0.0002
63
+
64
+ train_on_inputs: false
65
+ group_by_length: false
66
+ bf16: auto
67
+ fp16:
68
+ tf32: false
69
+
70
+ gradient_checkpointing: true
71
+ early_stopping_patience:
72
+ resume_from_checkpoint:
73
+ local_rank:
74
+ logging_steps: 1
75
+ xformers_attention:
76
+ flash_attention: true
77
+ s2_attention:
78
+
79
+ eval_sample_packing: False
80
+ warmup_steps: 10
81
+ evals_per_epoch: 4
82
+ saves_per_epoch: 1
83
+ debug:
84
+ deepspeed:
85
+ weight_decay: 0.0
86
+ fsdp:
87
+ fsdp_config:
88
+ special_tokens:
89
+ bos_token: "<s>"
90
+ eos_token: "</s>"
91
+ unk_token: "<unk>"
92
+
93
+ ```
94
+
95
+ </details><br>
96
+
97
+ # out/test
98
+
99
+ This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
100
+ It achieves the following results on the evaluation set:
101
+ - Loss: 0.0553
102
+
103
+ ## Model description
104
+
105
+ More information needed
106
+
107
+ ## Intended uses & limitations
108
+
109
+ More information needed
110
+
111
+ ## Training and evaluation data
112
+
113
+ More information needed
114
+
115
+ ## Training procedure
116
+
117
+ ### Training hyperparameters
118
+
119
+ The following hyperparameters were used during training:
120
+ - learning_rate: 0.0002
121
+ - train_batch_size: 4
122
+ - eval_batch_size: 4
123
+ - seed: 42
124
+ - distributed_type: multi-GPU
125
+ - num_devices: 2
126
+ - gradient_accumulation_steps: 4
127
+ - total_train_batch_size: 32
128
+ - total_eval_batch_size: 8
129
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
130
+ - lr_scheduler_type: cosine
131
+ - lr_scheduler_warmup_steps: 10
132
+ - num_epochs: 4
133
+
134
+ ### Training results
135
+
136
+ | Training Loss | Epoch | Step | Validation Loss |
137
+ |:-------------:|:-----:|:----:|:---------------:|
138
+ | 0.1833 | 0.06 | 1 | 0.1833 |
139
+ | 0.175 | 0.29 | 5 | 0.1681 |
140
+ | 0.1172 | 0.57 | 10 | 0.1097 |
141
+ | 0.0917 | 0.86 | 15 | 0.0878 |
142
+ | 0.0779 | 1.11 | 20 | 0.0750 |
143
+ | 0.0706 | 1.4 | 25 | 0.0682 |
144
+ | 0.0642 | 1.69 | 30 | 0.0635 |
145
+ | 0.0617 | 1.97 | 35 | 0.0609 |
146
+ | 0.0602 | 2.21 | 40 | 0.0588 |
147
+ | 0.0574 | 2.5 | 45 | 0.0573 |
148
+ | 0.0565 | 2.79 | 50 | 0.0563 |
149
+ | 0.0561 | 3.03 | 55 | 0.0558 |
150
+ | 0.0566 | 3.31 | 60 | 0.0554 |
151
+ | 0.0551 | 3.6 | 65 | 0.0553 |
152
+
153
+
154
+ ### Framework versions
155
+
156
+ - PEFT 0.9.1.dev0
157
+ - Transformers 4.39.0.dev0
158
+ - Pytorch 2.2.0+cu121
159
+ - Datasets 2.17.1
160
+ - Tokenizers 0.15.0