PocketDoc commited on
Commit
9ed0769
1 Parent(s): 71c9829

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -168
README.md DELETED
@@ -1,168 +0,0 @@
1
- ---
2
- license: other
3
- base_model: stabilityai/stablelm-2-1_6b
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: stablelm_1-6b_ContextSplitter
8
- results: []
9
- ---
10
-
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.4.0`
18
- ```yaml
19
- base_model: stabilityai/stablelm-2-1_6b
20
- base_model_config: stabilityai/stablelm-2-1_6b
21
- model_type: StableLMEpochForCausalLM
22
- tokenizer_type: AutoTokenizer
23
-
24
- trust_remote_code: true
25
-
26
- load_in_8bit: false
27
- load_in_4bit: false
28
- strict: false
29
-
30
- datasets:
31
- - path: /run/media/username/Storage/datasets/repo/alpaca/context-aware-splits-english_new.json
32
- type: alpaca
33
-
34
- dataset_prepared_path: stablelm_1-6b_ContextSplitter_data
35
- val_set_size: 0.02
36
- output_dir: ./stablelm_1-6b_ContextSplitter
37
-
38
- sequence_len: 4096
39
- sample_packing: true
40
- pad_to_sequence_len: true
41
-
42
- adapter:
43
- lora_model_dir:
44
- lora_r:
45
- lora_alpha:
46
- lora_dropout:
47
- lora_target_linear:
48
- lora_fan_in_fan_out:
49
-
50
- wandb_project: stablelm_1-6b_ContextSplitter
51
- wandb_entity:
52
- wandb_watch:
53
- wandb_name:
54
- wandb_log_model:
55
-
56
- gradient_accumulation_steps: 1
57
- micro_batch_size: 1
58
- num_epochs: 1
59
- optimizer: paged_adamw_32bit
60
- lr_scheduler: cosine
61
- learning_rate: 0.00001
62
-
63
- train_on_inputs: false
64
- group_by_length: false
65
- bf16: true
66
- fp16: false
67
- tf32: false
68
-
69
- gradient_checkpointing: true
70
- early_stopping_patience:
71
- resume_from_checkpoint:
72
- local_rank:
73
- logging_steps: 1
74
- xformers_attention:
75
- flash_attention: true
76
- flash_attn_cross_entropy: false
77
- flash_attn_rms_norm: true
78
- flash_attn_fuse_qkv: false
79
- flash_attn_fuse_mlp: true
80
-
81
- warmup_steps: 100
82
- evals_per_epoch: 30
83
- eval_table_size:
84
- saves_per_epoch: 4
85
- debug:
86
- deepspeed: #deepspeed_configs/zero2.json # multi-gpu only
87
- weight_decay: 0.1
88
- fsdp:
89
- fsdp_config:
90
- special_tokens:
91
- ```
92
-
93
- </details><br>
94
-
95
- # stablelm_1-6b_ContextSplitter
96
-
97
- This model is a fine-tuned version of [stabilityai/stablelm-2-1_6b](https://huggingface.co/stabilityai/stablelm-2-1_6b) on the None dataset.
98
- It achieves the following results on the evaluation set:
99
- - Loss: 0.0377
100
-
101
- ## Model description
102
-
103
- More information needed
104
-
105
- ## Intended uses & limitations
106
-
107
- More information needed
108
-
109
- ## Training and evaluation data
110
-
111
- More information needed
112
-
113
- ## Training procedure
114
-
115
- ### Training hyperparameters
116
-
117
- The following hyperparameters were used during training:
118
- - learning_rate: 1e-05
119
- - train_batch_size: 1
120
- - eval_batch_size: 1
121
- - seed: 42
122
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
123
- - lr_scheduler_type: cosine
124
- - lr_scheduler_warmup_steps: 100
125
- - num_epochs: 1
126
-
127
- ### Training results
128
-
129
- | Training Loss | Epoch | Step | Validation Loss |
130
- |:-------------:|:-----:|:----:|:---------------:|
131
- | 0.1781 | 0.0 | 1 | 0.2283 |
132
- | 0.0709 | 0.03 | 248 | 0.0589 |
133
- | 0.0274 | 0.07 | 496 | 0.0512 |
134
- | 0.0614 | 0.1 | 744 | 0.0480 |
135
- | 0.0266 | 0.13 | 992 | 0.0466 |
136
- | 0.0471 | 0.17 | 1240 | 0.0440 |
137
- | 0.0425 | 0.2 | 1488 | 0.0435 |
138
- | 0.1172 | 0.23 | 1736 | 0.0423 |
139
- | 0.0322 | 0.27 | 1984 | 0.0415 |
140
- | 0.0529 | 0.3 | 2232 | 0.0413 |
141
- | 0.0296 | 0.33 | 2480 | 0.0409 |
142
- | 0.0357 | 0.37 | 2728 | 0.0398 |
143
- | 0.0242 | 0.4 | 2976 | 0.0394 |
144
- | 0.0266 | 0.43 | 3224 | 0.0391 |
145
- | 0.0292 | 0.47 | 3472 | 0.0386 |
146
- | 0.0261 | 0.5 | 3720 | 0.0386 |
147
- | 0.0382 | 0.53 | 3968 | 0.0383 |
148
- | 0.0378 | 0.57 | 4216 | 0.0383 |
149
- | 0.0345 | 0.6 | 4464 | 0.0379 |
150
- | 0.0467 | 0.64 | 4712 | 0.0379 |
151
- | 0.0542 | 0.67 | 4960 | 0.0378 |
152
- | 0.0317 | 0.7 | 5208 | 0.0378 |
153
- | 0.0363 | 0.74 | 5456 | 0.0377 |
154
- | 0.054 | 0.77 | 5704 | 0.0377 |
155
- | 0.0207 | 0.8 | 5952 | 0.0377 |
156
- | 0.0302 | 0.84 | 6200 | 0.0377 |
157
- | 0.0427 | 0.87 | 6448 | 0.0377 |
158
- | 0.0278 | 0.9 | 6696 | 0.0377 |
159
- | 0.0648 | 0.94 | 6944 | 0.0377 |
160
- | 0.0497 | 0.97 | 7192 | 0.0377 |
161
-
162
-
163
- ### Framework versions
164
-
165
- - Transformers 4.38.0.dev0
166
- - Pytorch 2.0.1+cu117
167
- - Datasets 2.15.0
168
- - Tokenizers 0.15.0