File size: 11,640 Bytes
9396698
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1020
- Num Input Tokens Seen: 38487592

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.6684        | 0.0071 | 5    | 1.3917          | 267136            |
| 1.5973        | 0.0143 | 10   | 1.3492          | 527944            |
| 1.4177        | 0.0214 | 15   | 1.2804          | 806832            |
| 1.3667        | 0.0285 | 20   | 1.2239          | 1089448           |
| 1.327         | 0.0356 | 25   | 1.1805          | 1360128           |
| 1.1902        | 0.0428 | 30   | 1.1767          | 1645016           |
| 1.1371        | 0.0499 | 35   | 1.1639          | 1922096           |
| 1.0204        | 0.0570 | 40   | 1.1790          | 2195688           |
| 0.8439        | 0.0642 | 45   | 1.1984          | 2468224           |
| 0.8398        | 0.0713 | 50   | 1.2357          | 2748648           |
| 0.6944        | 0.0784 | 55   | 1.2190          | 3026552           |
| 0.6448        | 0.0855 | 60   | 1.2495          | 3301512           |
| 0.674         | 0.0927 | 65   | 1.2314          | 3571048           |
| 0.5917        | 0.0998 | 70   | 1.2129          | 3845032           |
| 0.4513        | 0.1069 | 75   | 1.2212          | 4112448           |
| 0.4732        | 0.1141 | 80   | 1.2010          | 4388696           |
| 0.5147        | 0.1212 | 85   | 1.2146          | 4668664           |
| 0.4466        | 0.1283 | 90   | 1.1984          | 4940912           |
| 0.3307        | 0.1354 | 95   | 1.2064          | 5215928           |
| 0.4373        | 0.1426 | 100  | 1.1983          | 5491272           |
| 0.4091        | 0.1497 | 105  | 1.1922          | 5771016           |
| 0.3565        | 0.1568 | 110  | 1.1836          | 6042648           |
| 0.4144        | 0.1640 | 115  | 1.1901          | 6319168           |
| 0.3271        | 0.1711 | 120  | 1.1863          | 6595848           |
| 0.3036        | 0.1782 | 125  | 1.1822          | 6874424           |
| 0.247         | 0.1854 | 130  | 1.1854          | 7150560           |
| 0.2981        | 0.1925 | 135  | 1.1752          | 7427296           |
| 0.2897        | 0.1996 | 140  | 1.1820          | 7699936           |
| 0.3774        | 0.2067 | 145  | 1.1722          | 7974304           |
| 0.2749        | 0.2139 | 150  | 1.1697          | 8243304           |
| 0.1711        | 0.2210 | 155  | 1.1795          | 8514432           |
| 0.3155        | 0.2281 | 160  | 1.1652          | 8786576           |
| 0.2774        | 0.2353 | 165  | 1.1709          | 9067648           |
| 0.3152        | 0.2424 | 170  | 1.1679          | 9337744           |
| 0.3076        | 0.2495 | 175  | 1.1645          | 9614672           |
| 0.2671        | 0.2566 | 180  | 1.1619          | 9891496           |
| 0.2063        | 0.2638 | 185  | 1.1608          | 10166192          |
| 0.1924        | 0.2709 | 190  | 1.1600          | 10441352          |
| 0.2558        | 0.2780 | 195  | 1.1575          | 10718632          |
| 0.2587        | 0.2852 | 200  | 1.1601          | 10990920          |
| 0.3404        | 0.2923 | 205  | 1.1566          | 11267848          |
| 0.2668        | 0.2994 | 210  | 1.1547          | 11541440          |
| 0.2414        | 0.3065 | 215  | 1.1554          | 11815968          |
| 0.2503        | 0.3137 | 220  | 1.1508          | 12086520          |
| 0.2804        | 0.3208 | 225  | 1.1537          | 12362432          |
| 0.2019        | 0.3279 | 230  | 1.1510          | 12629384          |
| 0.2269        | 0.3351 | 235  | 1.1474          | 12906600          |
| 0.2972        | 0.3422 | 240  | 1.1543          | 13182328          |
| 0.1945        | 0.3493 | 245  | 1.1487          | 13454848          |
| 0.2719        | 0.3564 | 250  | 1.1463          | 13725400          |
| 0.3308        | 0.3636 | 255  | 1.1463          | 14002992          |
| 0.2309        | 0.3707 | 260  | 1.1442          | 14273016          |
| 0.2641        | 0.3778 | 265  | 1.1388          | 14546376          |
| 0.2995        | 0.3850 | 270  | 1.1452          | 14822144          |
| 0.2778        | 0.3921 | 275  | 1.1409          | 15099184          |
| 0.2189        | 0.3992 | 280  | 1.1374          | 15377816          |
| 0.2998        | 0.4063 | 285  | 1.1414          | 15651240          |
| 0.3122        | 0.4135 | 290  | 1.1391          | 15922608          |
| 0.3337        | 0.4206 | 295  | 1.1342          | 16193632          |
| 0.2351        | 0.4277 | 300  | 1.1360          | 16469976          |
| 0.2763        | 0.4349 | 305  | 1.1346          | 16740760          |
| 0.3261        | 0.4420 | 310  | 1.1370          | 17015216          |
| 0.2783        | 0.4491 | 315  | 1.1364          | 17289608          |
| 0.2433        | 0.4562 | 320  | 1.1320          | 17557448          |
| 0.2029        | 0.4634 | 325  | 1.1329          | 17828456          |
| 0.2399        | 0.4705 | 330  | 1.1352          | 18104216          |
| 0.2676        | 0.4776 | 335  | 1.1298          | 18376544          |
| 0.2009        | 0.4848 | 340  | 1.1345          | 18650968          |
| 0.3097        | 0.4919 | 345  | 1.1312          | 18928000          |
| 0.2695        | 0.4990 | 350  | 1.1259          | 19197288          |
| 0.2933        | 0.5061 | 355  | 1.1309          | 19474976          |
| 0.2231        | 0.5133 | 360  | 1.1298          | 19761168          |
| 0.3188        | 0.5204 | 365  | 1.1267          | 20035664          |
| 0.2614        | 0.5275 | 370  | 1.1306          | 20311304          |
| 0.2824        | 0.5347 | 375  | 1.1279          | 20587848          |
| 0.2569        | 0.5418 | 380  | 1.1238          | 20863952          |
| 0.2747        | 0.5489 | 385  | 1.1257          | 21149864          |
| 0.258         | 0.5561 | 390  | 1.1274          | 21424128          |
| 0.2175        | 0.5632 | 395  | 1.1243          | 21700024          |
| 0.2213        | 0.5703 | 400  | 1.1246          | 21974976          |
| 0.3015        | 0.5774 | 405  | 1.1230          | 22241808          |
| 0.2435        | 0.5846 | 410  | 1.1218          | 22516720          |
| 0.2905        | 0.5917 | 415  | 1.1241          | 22789008          |
| 0.2361        | 0.5988 | 420  | 1.1221          | 23067672          |
| 0.2975        | 0.6060 | 425  | 1.1212          | 23342176          |
| 0.2594        | 0.6131 | 430  | 1.1214          | 23612040          |
| 0.2303        | 0.6202 | 435  | 1.1207          | 23887616          |
| 0.2454        | 0.6273 | 440  | 1.1195          | 24162232          |
| 0.2677        | 0.6345 | 445  | 1.1196          | 24433008          |
| 0.1848        | 0.6416 | 450  | 1.1196          | 24705832          |
| 0.2359        | 0.6487 | 455  | 1.1208          | 24984040          |
| 0.2962        | 0.6559 | 460  | 1.1212          | 25256024          |
| 0.2943        | 0.6630 | 465  | 1.1179          | 25525664          |
| 0.2482        | 0.6701 | 470  | 1.1191          | 25802976          |
| 0.2206        | 0.6772 | 475  | 1.1156          | 26079952          |
| 0.3008        | 0.6844 | 480  | 1.1175          | 26355712          |
| 0.1662        | 0.6915 | 485  | 1.1171          | 26631360          |
| 0.2349        | 0.6986 | 490  | 1.1161          | 26910880          |
| 0.1984        | 0.7058 | 495  | 1.1152          | 27189568          |
| 0.1594        | 0.7129 | 500  | 1.1176          | 27462312          |
| 0.2599        | 0.7200 | 505  | 1.1168          | 27734488          |
| 0.2337        | 0.7271 | 510  | 1.1125          | 28014184          |
| 0.2884        | 0.7343 | 515  | 1.1154          | 28292584          |
| 0.1878        | 0.7414 | 520  | 1.1138          | 28566848          |
| 0.2564        | 0.7485 | 525  | 1.1124          | 28850664          |
| 0.2353        | 0.7557 | 530  | 1.1127          | 29124184          |
| 0.2854        | 0.7628 | 535  | 1.1136          | 29401408          |
| 0.1839        | 0.7699 | 540  | 1.1118          | 29680840          |
| 0.1636        | 0.7770 | 545  | 1.1113          | 29960360          |
| 0.317         | 0.7842 | 550  | 1.1140          | 30233968          |
| 0.267         | 0.7913 | 555  | 1.1101          | 30507104          |
| 0.1583        | 0.7984 | 560  | 1.1127          | 30782136          |
| 0.2464        | 0.8056 | 565  | 1.1143          | 31061608          |
| 0.22          | 0.8127 | 570  | 1.1096          | 31333776          |
| 0.211         | 0.8198 | 575  | 1.1095          | 31608144          |
| 0.3073        | 0.8269 | 580  | 1.1112          | 31876368          |
| 0.1747        | 0.8341 | 585  | 1.1084          | 32146688          |
| 0.2157        | 0.8412 | 590  | 1.1102          | 32419328          |
| 0.2618        | 0.8483 | 595  | 1.1089          | 32690328          |
| 0.2084        | 0.8555 | 600  | 1.1064          | 32960256          |
| 0.2344        | 0.8626 | 605  | 1.1063          | 33234896          |
| 0.2234        | 0.8697 | 610  | 1.1096          | 33509632          |
| 0.2156        | 0.8768 | 615  | 1.1068          | 33781672          |
| 0.3154        | 0.8840 | 620  | 1.1046          | 34058936          |
| 0.2087        | 0.8911 | 625  | 1.1089          | 34334296          |
| 0.1694        | 0.8982 | 630  | 1.1063          | 34603152          |
| 0.2507        | 0.9054 | 635  | 1.1040          | 34874256          |
| 0.2275        | 0.9125 | 640  | 1.1057          | 35144432          |
| 0.2456        | 0.9196 | 645  | 1.1060          | 35423104          |
| 0.236         | 0.9268 | 650  | 1.1071          | 35688376          |
| 0.2216        | 0.9339 | 655  | 1.1074          | 35964360          |
| 0.2621        | 0.9410 | 660  | 1.1058          | 36242960          |
| 0.2174        | 0.9481 | 665  | 1.1031          | 36512112          |
| 0.2301        | 0.9553 | 670  | 1.1044          | 36780048          |
| 0.2529        | 0.9624 | 675  | 1.1049          | 37051992          |
| 0.2614        | 0.9695 | 680  | 1.1038          | 37328608          |
| 0.2334        | 0.9767 | 685  | 1.1023          | 37609592          |
| 0.1567        | 0.9838 | 690  | 1.1042          | 37882008          |
| 0.2197        | 0.9909 | 695  | 1.1037          | 38152448          |
| 0.2266        | 0.9980 | 700  | 1.1021          | 38431096          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1