besimray commited on
Commit
f143acb
1 Parent(s): fab6de2

End of training

Browse files
Files changed (3) hide show
  1. README.md +150 -0
  2. adapter_model.bin +3 -0
  3. adapter_model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: llama3.2
4
+ base_model: unsloth/Llama-3.2-1B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: miner_id_1_383a850e-bb15-45a2-8f4b-fc96eb001a74_1729712965
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ adapter: lora
22
+ base_model: unsloth/Llama-3.2-1B-Instruct
23
+ bf16: auto
24
+ chat_template: llama3
25
+ dataset_prepared_path: null
26
+ datasets:
27
+ - path: mhenrichsen/alpaca_2k_test
28
+ type: alpaca
29
+ debug: null
30
+ deepspeed: null
31
+ early_stopping_patience: 10
32
+ eval_max_new_tokens: 128
33
+ eval_steps: 20
34
+ eval_table_size: null
35
+ flash_attention: true
36
+ fp16: null
37
+ fsdp: null
38
+ fsdp_config: null
39
+ gradient_accumulation_steps: 4
40
+ gradient_checkpointing: true
41
+ group_by_length: false
42
+ hub_model_id: besimray/miner_id_1_383a850e-bb15-45a2-8f4b-fc96eb001a74_1729712965
43
+ hub_strategy: checkpoint
44
+ hub_token: null
45
+ learning_rate: 0.0002
46
+ load_in_4bit: false
47
+ load_in_8bit: true
48
+ local_rank: null
49
+ logging_steps: 1
50
+ lora_alpha: 16
51
+ lora_dropout: 0.05
52
+ lora_fan_in_fan_out: null
53
+ lora_model_dir: null
54
+ lora_r: 32
55
+ lora_target_linear: true
56
+ lr_scheduler: cosine
57
+ max_steps: 10000
58
+ micro_batch_size: 10
59
+ mlflow_experiment_name: mhenrichsen/alpaca_2k_test
60
+ model_type: LlamaForCausalLM
61
+ num_epochs: 100
62
+ optimizer: adamw_bnb_8bit
63
+ output_dir: miner_id_besimray
64
+ pad_to_sequence_len: true
65
+ resume_from_checkpoint: null
66
+ s2_attention: null
67
+ sample_packing: false
68
+ save_steps: 20
69
+ save_strategy: steps
70
+ sequence_len: 4096
71
+ strict: false
72
+ tf32: false
73
+ tokenizer_type: AutoTokenizer
74
+ train_on_inputs: false
75
+ val_set_size: 0.05
76
+ wandb_entity: besimray24-rayon
77
+ wandb_mode: online
78
+ wandb_project: Public_TuningSN
79
+ wandb_run: miner_id_24
80
+ wandb_runid: 383a850e-bb15-45a2-8f4b-fc96eb001a74
81
+ warmup_steps: 10
82
+ weight_decay: 0.01
83
+ xformers_attention: null
84
+
85
+ ```
86
+
87
+ </details><br>
88
+
89
+ # miner_id_1_383a850e-bb15-45a2-8f4b-fc96eb001a74_1729712965
90
+
91
+ This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
92
+ It achieves the following results on the evaluation set:
93
+ - Loss: 1.5010
94
+
95
+ ## Model description
96
+
97
+ More information needed
98
+
99
+ ## Intended uses & limitations
100
+
101
+ More information needed
102
+
103
+ ## Training and evaluation data
104
+
105
+ More information needed
106
+
107
+ ## Training procedure
108
+
109
+ ### Training hyperparameters
110
+
111
+ The following hyperparameters were used during training:
112
+ - learning_rate: 0.0002
113
+ - train_batch_size: 10
114
+ - eval_batch_size: 10
115
+ - seed: 42
116
+ - gradient_accumulation_steps: 4
117
+ - total_train_batch_size: 40
118
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
119
+ - lr_scheduler_type: cosine
120
+ - lr_scheduler_warmup_steps: 10
121
+ - training_steps: 4750
122
+
123
+ ### Training results
124
+
125
+ | Training Loss | Epoch | Step | Validation Loss |
126
+ |:-------------:|:------:|:----:|:---------------:|
127
+ | 1.2983 | 0.0211 | 1 | 1.2586 |
128
+ | 1.3601 | 0.4211 | 20 | 1.1757 |
129
+ | 1.2034 | 0.8421 | 40 | 1.1567 |
130
+ | 1.1302 | 1.2632 | 60 | 1.1534 |
131
+ | 1.0958 | 1.6842 | 80 | 1.1512 |
132
+ | 1.0285 | 2.1053 | 100 | 1.1653 |
133
+ | 1.1265 | 2.5263 | 120 | 1.1785 |
134
+ | 1.0215 | 2.9474 | 140 | 1.1921 |
135
+ | 0.8495 | 3.3684 | 160 | 1.2673 |
136
+ | 0.901 | 3.7895 | 180 | 1.2611 |
137
+ | 0.7058 | 4.2105 | 200 | 1.3737 |
138
+ | 0.7428 | 4.6316 | 220 | 1.3824 |
139
+ | 0.4866 | 5.0526 | 240 | 1.4475 |
140
+ | 0.5298 | 5.4737 | 260 | 1.5484 |
141
+ | 0.5671 | 5.8947 | 280 | 1.5010 |
142
+
143
+
144
+ ### Framework versions
145
+
146
+ - PEFT 0.13.2
147
+ - Transformers 4.45.2
148
+ - Pytorch 2.4.1+cu124
149
+ - Datasets 3.0.1
150
+ - Tokenizers 0.20.1
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26e3838fdc613d4d229ce1f0c3a0317b1aed29ad70fead6020003107ffdfe1e4
3
+ size 90258378
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bedd12fb64e67b0ba54ce9a65703cf9231d8f3947a3a2421c5324ea3c4f4a458
3
  size 90207248
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f88ee1ce00880c3366f6f0e923ceff9ce3bc2fb04a8a8fefeaaa919364b0e30f
3
  size 90207248