File size: 18,480 Bytes
4f35822
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1047
- Num Input Tokens Seen: 63448944

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.6616        | 0.0043 | 5    | 1.3941          | 269008            |
| 1.5787        | 0.0085 | 10   | 1.3763          | 538784            |
| 1.6061        | 0.0128 | 15   | 1.3402          | 805456            |
| 1.542         | 0.0170 | 20   | 1.2816          | 1075056           |
| 1.5257        | 0.0213 | 25   | 1.2400          | 1336536           |
| 1.4703        | 0.0255 | 30   | 1.1987          | 1607712           |
| 1.2258        | 0.0298 | 35   | 1.1785          | 1874888           |
| 1.1104        | 0.0340 | 40   | 1.1862          | 2141664           |
| 1.1191        | 0.0383 | 45   | 1.1893          | 2412176           |
| 0.9131        | 0.0425 | 50   | 1.2149          | 2679368           |
| 0.8898        | 0.0468 | 55   | 1.2700          | 2942232           |
| 0.7874        | 0.0510 | 60   | 1.2706          | 3205952           |
| 0.5907        | 0.0553 | 65   | 1.2847          | 3468440           |
| 0.4503        | 0.0595 | 70   | 1.2806          | 3740128           |
| 0.5092        | 0.0638 | 75   | 1.2788          | 4009264           |
| 0.4167        | 0.0680 | 80   | 1.2665          | 4274760           |
| 0.417         | 0.0723 | 85   | 1.2501          | 4546256           |
| 0.3088        | 0.0765 | 90   | 1.2446          | 4808944           |
| 0.4041        | 0.0808 | 95   | 1.2336          | 5076456           |
| 0.2974        | 0.0850 | 100  | 1.2323          | 5347376           |
| 0.2938        | 0.0893 | 105  | 1.2199          | 5621736           |
| 0.287         | 0.0935 | 110  | 1.2339          | 5898632           |
| 0.2297        | 0.0978 | 115  | 1.2183          | 6172408           |
| 0.2524        | 0.1020 | 120  | 1.2174          | 6443328           |
| 0.3736        | 0.1063 | 125  | 1.2051          | 6712584           |
| 0.3085        | 0.1106 | 130  | 1.2169          | 6985816           |
| 0.2702        | 0.1148 | 135  | 1.2060          | 7253504           |
| 0.2927        | 0.1191 | 140  | 1.2086          | 7527360           |
| 0.2535        | 0.1233 | 145  | 1.2002          | 7795952           |
| 0.257         | 0.1276 | 150  | 1.1968          | 8063272           |
| 0.1809        | 0.1318 | 155  | 1.2008          | 8341072           |
| 0.2557        | 0.1361 | 160  | 1.1920          | 8613272           |
| 0.164         | 0.1403 | 165  | 1.1975          | 8886320           |
| 0.2581        | 0.1446 | 170  | 1.1859          | 9158168           |
| 0.2019        | 0.1488 | 175  | 1.1917          | 9427592           |
| 0.235         | 0.1531 | 180  | 1.1826          | 9707168           |
| 0.2164        | 0.1573 | 185  | 1.1872          | 9978504           |
| 0.2057        | 0.1616 | 190  | 1.1885          | 10253736          |
| 0.1802        | 0.1658 | 195  | 1.1793          | 10524208          |
| 0.1872        | 0.1701 | 200  | 1.1845          | 10790288          |
| 0.2015        | 0.1743 | 205  | 1.1854          | 11053584          |
| 0.2384        | 0.1786 | 210  | 1.1791          | 11331848          |
| 0.2103        | 0.1828 | 215  | 1.1801          | 11605448          |
| 0.2122        | 0.1871 | 220  | 1.1813          | 11879880          |
| 0.1953        | 0.1913 | 225  | 1.1806          | 12151528          |
| 0.1833        | 0.1956 | 230  | 1.1797          | 12428920          |
| 0.1686        | 0.1998 | 235  | 1.1767          | 12699192          |
| 0.1687        | 0.2041 | 240  | 1.1778          | 12970688          |
| 0.2107        | 0.2083 | 245  | 1.1769          | 13234504          |
| 0.2416        | 0.2126 | 250  | 1.1706          | 13505840          |
| 0.2221        | 0.2168 | 255  | 1.1668          | 13773632          |
| 0.1691        | 0.2211 | 260  | 1.1705          | 14051072          |
| 0.1346        | 0.2254 | 265  | 1.1608          | 14321792          |
| 0.177         | 0.2296 | 270  | 1.1656          | 14600320          |
| 0.2298        | 0.2339 | 275  | 1.1672          | 14873872          |
| 0.1853        | 0.2381 | 280  | 1.1621          | 15147328          |
| 0.2145        | 0.2424 | 285  | 1.1626          | 15417416          |
| 0.1656        | 0.2466 | 290  | 1.1592          | 15689168          |
| 0.2127        | 0.2509 | 295  | 1.1598          | 15955088          |
| 0.1722        | 0.2551 | 300  | 1.1605          | 16222264          |
| 0.2392        | 0.2594 | 305  | 1.1578          | 16489368          |
| 0.1921        | 0.2636 | 310  | 1.1597          | 16761168          |
| 0.1397        | 0.2679 | 315  | 1.1550          | 17029576          |
| 0.138         | 0.2721 | 320  | 1.1551          | 17302520          |
| 0.1682        | 0.2764 | 325  | 1.1585          | 17577312          |
| 0.1804        | 0.2806 | 330  | 1.1512          | 17847720          |
| 0.1808        | 0.2849 | 335  | 1.1523          | 18112320          |
| 0.1563        | 0.2891 | 340  | 1.1576          | 18379416          |
| 0.1718        | 0.2934 | 345  | 1.1521          | 18642848          |
| 0.1676        | 0.2976 | 350  | 1.1514          | 18914928          |
| 0.1584        | 0.3019 | 355  | 1.1501          | 19181856          |
| 0.1449        | 0.3061 | 360  | 1.1490          | 19448128          |
| 0.247         | 0.3104 | 365  | 1.1506          | 19721928          |
| 0.1676        | 0.3146 | 370  | 1.1534          | 19995120          |
| 0.1427        | 0.3189 | 375  | 1.1491          | 20263544          |
| 0.1552        | 0.3231 | 380  | 1.1476          | 20525712          |
| 0.1641        | 0.3274 | 385  | 1.1466          | 20793992          |
| 0.1818        | 0.3317 | 390  | 1.1474          | 21062552          |
| 0.1938        | 0.3359 | 395  | 1.1467          | 21327608          |
| 0.1872        | 0.3402 | 400  | 1.1450          | 21602552          |
| 0.238         | 0.3444 | 405  | 1.1490          | 21874224          |
| 0.1042        | 0.3487 | 410  | 1.1430          | 22143792          |
| 0.1036        | 0.3529 | 415  | 1.1442          | 22424408          |
| 0.1606        | 0.3572 | 420  | 1.1444          | 22693520          |
| 0.188         | 0.3614 | 425  | 1.1438          | 22962496          |
| 0.1836        | 0.3657 | 430  | 1.1462          | 23234648          |
| 0.1706        | 0.3699 | 435  | 1.1426          | 23506408          |
| 0.1614        | 0.3742 | 440  | 1.1425          | 23777032          |
| 0.1609        | 0.3784 | 445  | 1.1433          | 24050192          |
| 0.116         | 0.3827 | 450  | 1.1430          | 24316000          |
| 0.1864        | 0.3869 | 455  | 1.1425          | 24589336          |
| 0.198         | 0.3912 | 460  | 1.1378          | 24861112          |
| 0.1611        | 0.3954 | 465  | 1.1397          | 25136072          |
| 0.1429        | 0.3997 | 470  | 1.1399          | 25403784          |
| 0.1901        | 0.4039 | 475  | 1.1363          | 25670752          |
| 0.2213        | 0.4082 | 480  | 1.1353          | 25945016          |
| 0.1166        | 0.4124 | 485  | 1.1395          | 26220704          |
| 0.1259        | 0.4167 | 490  | 1.1357          | 26484920          |
| 0.2132        | 0.4209 | 495  | 1.1331          | 26752640          |
| 0.1699        | 0.4252 | 500  | 1.1347          | 27018608          |
| 0.0938        | 0.4294 | 505  | 1.1352          | 27286128          |
| 0.1752        | 0.4337 | 510  | 1.1370          | 27562024          |
| 0.1873        | 0.4379 | 515  | 1.1320          | 27830728          |
| 0.1796        | 0.4422 | 520  | 1.1322          | 28103888          |
| 0.1176        | 0.4465 | 525  | 1.1345          | 28371784          |
| 0.0928        | 0.4507 | 530  | 1.1345          | 28642592          |
| 0.1709        | 0.4550 | 535  | 1.1332          | 28903728          |
| 0.1094        | 0.4592 | 540  | 1.1325          | 29178904          |
| 0.1501        | 0.4635 | 545  | 1.1337          | 29448176          |
| 0.1372        | 0.4677 | 550  | 1.1325          | 29717176          |
| 0.1512        | 0.4720 | 555  | 1.1340          | 29984912          |
| 0.1478        | 0.4762 | 560  | 1.1313          | 30258688          |
| 0.1654        | 0.4805 | 565  | 1.1306          | 30521056          |
| 0.165         | 0.4847 | 570  | 1.1319          | 30792392          |
| 0.1263        | 0.4890 | 575  | 1.1324          | 31061264          |
| 0.1196        | 0.4932 | 580  | 1.1299          | 31330912          |
| 0.1268        | 0.4975 | 585  | 1.1305          | 31603912          |
| 0.1234        | 0.5017 | 590  | 1.1312          | 31884080          |
| 0.1143        | 0.5060 | 595  | 1.1285          | 32152232          |
| 0.1784        | 0.5102 | 600  | 1.1281          | 32414424          |
| 0.1548        | 0.5145 | 605  | 1.1310          | 32688920          |
| 0.202         | 0.5187 | 610  | 1.1276          | 32959712          |
| 0.2025        | 0.5230 | 615  | 1.1271          | 33233432          |
| 0.2025        | 0.5272 | 620  | 1.1291          | 33504392          |
| 0.1724        | 0.5315 | 625  | 1.1266          | 33777920          |
| 0.1809        | 0.5357 | 630  | 1.1255          | 34045208          |
| 0.2091        | 0.5400 | 635  | 1.1266          | 34316536          |
| 0.1236        | 0.5442 | 640  | 1.1257          | 34588848          |
| 0.2578        | 0.5485 | 645  | 1.1225          | 34861720          |
| 0.1594        | 0.5528 | 650  | 1.1229          | 35137320          |
| 0.0931        | 0.5570 | 655  | 1.1263          | 35408808          |
| 0.1531        | 0.5613 | 660  | 1.1285          | 35680680          |
| 0.1458        | 0.5655 | 665  | 1.1248          | 35946696          |
| 0.1638        | 0.5698 | 670  | 1.1234          | 36213456          |
| 0.0762        | 0.5740 | 675  | 1.1252          | 36478736          |
| 0.1295        | 0.5783 | 680  | 1.1270          | 36751144          |
| 0.1237        | 0.5825 | 685  | 1.1246          | 37020688          |
| 0.1947        | 0.5868 | 690  | 1.1251          | 37290280          |
| 0.185         | 0.5910 | 695  | 1.1239          | 37559352          |
| 0.1981        | 0.5953 | 700  | 1.1241          | 37820632          |
| 0.171         | 0.5995 | 705  | 1.1214          | 38095952          |
| 0.1491        | 0.6038 | 710  | 1.1216          | 38355560          |
| 0.0939        | 0.6080 | 715  | 1.1226          | 38631968          |
| 0.0722        | 0.6123 | 720  | 1.1237          | 38901632          |
| 0.1797        | 0.6165 | 725  | 1.1198          | 39171656          |
| 0.1558        | 0.6208 | 730  | 1.1189          | 39443808          |
| 0.2049        | 0.6250 | 735  | 1.1207          | 39714152          |
| 0.1406        | 0.6293 | 740  | 1.1193          | 39986024          |
| 0.1522        | 0.6335 | 745  | 1.1207          | 40259512          |
| 0.0855        | 0.6378 | 750  | 1.1193          | 40528328          |
| 0.1577        | 0.6420 | 755  | 1.1210          | 40806056          |
| 0.1875        | 0.6463 | 760  | 1.1228          | 41080264          |
| 0.1831        | 0.6505 | 765  | 1.1172          | 41347064          |
| 0.1624        | 0.6548 | 770  | 1.1169          | 41627368          |
| 0.1936        | 0.6590 | 775  | 1.1189          | 41895808          |
| 0.1859        | 0.6633 | 780  | 1.1177          | 42171680          |
| 0.1319        | 0.6676 | 785  | 1.1170          | 42446136          |
| 0.1279        | 0.6718 | 790  | 1.1168          | 42718504          |
| 0.1451        | 0.6761 | 795  | 1.1177          | 42992080          |
| 0.1529        | 0.6803 | 800  | 1.1186          | 43262448          |
| 0.1099        | 0.6846 | 805  | 1.1203          | 43529920          |
| 0.1659        | 0.6888 | 810  | 1.1191          | 43797688          |
| 0.1703        | 0.6931 | 815  | 1.1194          | 44075648          |
| 0.1344        | 0.6973 | 820  | 1.1199          | 44341096          |
| 0.1972        | 0.7016 | 825  | 1.1171          | 44614744          |
| 0.1174        | 0.7058 | 830  | 1.1168          | 44887432          |
| 0.1518        | 0.7101 | 835  | 1.1198          | 45158376          |
| 0.1729        | 0.7143 | 840  | 1.1176          | 45424984          |
| 0.1381        | 0.7186 | 845  | 1.1167          | 45693168          |
| 0.1236        | 0.7228 | 850  | 1.1210          | 45969040          |
| 0.1639        | 0.7271 | 855  | 1.1176          | 46238320          |
| 0.2011        | 0.7313 | 860  | 1.1147          | 46514712          |
| 0.1606        | 0.7356 | 865  | 1.1172          | 46778816          |
| 0.1503        | 0.7398 | 870  | 1.1166          | 47045128          |
| 0.1572        | 0.7441 | 875  | 1.1153          | 47313360          |
| 0.1193        | 0.7483 | 880  | 1.1190          | 47582504          |
| 0.1329        | 0.7526 | 885  | 1.1170          | 47848336          |
| 0.1922        | 0.7568 | 890  | 1.1138          | 48111768          |
| 0.1721        | 0.7611 | 895  | 1.1151          | 48383224          |
| 0.1415        | 0.7653 | 900  | 1.1139          | 48647984          |
| 0.214         | 0.7696 | 905  | 1.1118          | 48921112          |
| 0.2069        | 0.7739 | 910  | 1.1152          | 49188248          |
| 0.1435        | 0.7781 | 915  | 1.1134          | 49456432          |
| 0.1642        | 0.7824 | 920  | 1.1133          | 49725712          |
| 0.1598        | 0.7866 | 925  | 1.1138          | 49986872          |
| 0.1459        | 0.7909 | 930  | 1.1109          | 50260592          |
| 0.1139        | 0.7951 | 935  | 1.1126          | 50535000          |
| 0.1806        | 0.7994 | 940  | 1.1131          | 50804760          |
| 0.1549        | 0.8036 | 945  | 1.1120          | 51071224          |
| 0.1602        | 0.8079 | 950  | 1.1095          | 51345824          |
| 0.1818        | 0.8121 | 955  | 1.1135          | 51613104          |
| 0.1792        | 0.8164 | 960  | 1.1131          | 51881448          |
| 0.1803        | 0.8206 | 965  | 1.1121          | 52152368          |
| 0.2375        | 0.8249 | 970  | 1.1108          | 52424208          |
| 0.1872        | 0.8291 | 975  | 1.1119          | 52690144          |
| 0.1566        | 0.8334 | 980  | 1.1111          | 52954608          |
| 0.1376        | 0.8376 | 985  | 1.1098          | 53220256          |
| 0.0842        | 0.8419 | 990  | 1.1110          | 53492064          |
| 0.1268        | 0.8461 | 995  | 1.1118          | 53761488          |
| 0.1792        | 0.8504 | 1000 | 1.1117          | 54027464          |
| 0.1417        | 0.8546 | 1005 | 1.1098          | 54298720          |
| 0.1595        | 0.8589 | 1010 | 1.1113          | 54574696          |
| 0.1297        | 0.8631 | 1015 | 1.1114          | 54842864          |
| 0.1904        | 0.8674 | 1020 | 1.1122          | 55107880          |
| 0.1061        | 0.8716 | 1025 | 1.1122          | 55381152          |
| 0.1769        | 0.8759 | 1030 | 1.1085          | 55649992          |
| 0.1567        | 0.8801 | 1035 | 1.1075          | 55919072          |
| 0.203         | 0.8844 | 1040 | 1.1093          | 56191424          |
| 0.1557        | 0.8887 | 1045 | 1.1089          | 56464008          |
| 0.21          | 0.8929 | 1050 | 1.1084          | 56737864          |
| 0.2126        | 0.8972 | 1055 | 1.1089          | 57003496          |
| 0.1087        | 0.9014 | 1060 | 1.1076          | 57278832          |
| 0.1838        | 0.9057 | 1065 | 1.1090          | 57551424          |
| 0.1381        | 0.9099 | 1070 | 1.1097          | 57824320          |
| 0.1953        | 0.9142 | 1075 | 1.1083          | 58091512          |
| 0.2044        | 0.9184 | 1080 | 1.1065          | 58358824          |
| 0.1871        | 0.9227 | 1085 | 1.1077          | 58626864          |
| 0.1504        | 0.9269 | 1090 | 1.1078          | 58889800          |
| 0.1559        | 0.9312 | 1095 | 1.1067          | 59159160          |
| 0.2046        | 0.9354 | 1100 | 1.1091          | 59430264          |
| 0.2033        | 0.9397 | 1105 | 1.1066          | 59697408          |
| 0.1562        | 0.9439 | 1110 | 1.1046          | 59965984          |
| 0.1652        | 0.9482 | 1115 | 1.1079          | 60236760          |
| 0.1624        | 0.9524 | 1120 | 1.1068          | 60502576          |
| 0.1708        | 0.9567 | 1125 | 1.1058          | 60770544          |
| 0.1041        | 0.9609 | 1130 | 1.1055          | 61037512          |
| 0.1748        | 0.9652 | 1135 | 1.1061          | 61313056          |
| 0.1736        | 0.9694 | 1140 | 1.1059          | 61577720          |
| 0.15          | 0.9737 | 1145 | 1.1074          | 61847592          |
| 0.1312        | 0.9779 | 1150 | 1.1078          | 62104056          |
| 0.2414        | 0.9822 | 1155 | 1.1051          | 62373576          |
| 0.1648        | 0.9864 | 1160 | 1.1045          | 62645296          |
| 0.1681        | 0.9907 | 1165 | 1.1086          | 62911584          |
| 0.1334        | 0.9950 | 1170 | 1.1087          | 63184416          |
| 0.1686        | 0.9992 | 1175 | 1.1047          | 63448944          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1