File size: 11,640 Bytes
319d00a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1156
- Num Input Tokens Seen: 38394432

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.64          | 0.0071 | 5    | 1.3915          | 282928            |
| 1.717         | 0.0142 | 10   | 1.3495          | 547680            |
| 1.4756        | 0.0214 | 15   | 1.2809          | 819464            |
| 1.3413        | 0.0285 | 20   | 1.2255          | 1088176           |
| 1.2434        | 0.0356 | 25   | 1.1810          | 1359440           |
| 1.2176        | 0.0427 | 30   | 1.1672          | 1625784           |
| 1.2541        | 0.0499 | 35   | 1.1491          | 1899896           |
| 0.9819        | 0.0570 | 40   | 1.1533          | 2176760           |
| 0.947         | 0.0641 | 45   | 1.1622          | 2458784           |
| 0.8886        | 0.0712 | 50   | 1.1769          | 2731336           |
| 0.7859        | 0.0784 | 55   | 1.2131          | 3004608           |
| 0.7724        | 0.0855 | 60   | 1.2111          | 3276648           |
| 0.8257        | 0.0926 | 65   | 1.2124          | 3552744           |
| 0.7196        | 0.0997 | 70   | 1.2153          | 3828616           |
| 0.7089        | 0.1068 | 75   | 1.2123          | 4108840           |
| 0.7354        | 0.1140 | 80   | 1.2026          | 4391920           |
| 0.6275        | 0.1211 | 85   | 1.2205          | 4674200           |
| 0.5129        | 0.1282 | 90   | 1.2144          | 4945712           |
| 0.4506        | 0.1353 | 95   | 1.2009          | 5214520           |
| 0.5107        | 0.1425 | 100  | 1.2186          | 5484592           |
| 0.4638        | 0.1496 | 105  | 1.2054          | 5752320           |
| 0.4786        | 0.1567 | 110  | 1.2011          | 6028136           |
| 0.5751        | 0.1638 | 115  | 1.2009          | 6304032           |
| 0.4034        | 0.1710 | 120  | 1.2037          | 6579840           |
| 0.3894        | 0.1781 | 125  | 1.1952          | 6855056           |
| 0.4096        | 0.1852 | 130  | 1.1990          | 7132912           |
| 0.486         | 0.1923 | 135  | 1.1961          | 7401704           |
| 0.3722        | 0.1994 | 140  | 1.1943          | 7674144           |
| 0.3758        | 0.2066 | 145  | 1.1971          | 7955296           |
| 0.3871        | 0.2137 | 150  | 1.1955          | 8232712           |
| 0.3788        | 0.2208 | 155  | 1.1905          | 8504176           |
| 0.3235        | 0.2279 | 160  | 1.1879          | 8779072           |
| 0.3315        | 0.2351 | 165  | 1.1902          | 9059672           |
| 0.328         | 0.2422 | 170  | 1.1905          | 9336368           |
| 0.3476        | 0.2493 | 175  | 1.1880          | 9601712           |
| 0.2789        | 0.2564 | 180  | 1.1829          | 9871144           |
| 0.2937        | 0.2636 | 185  | 1.1835          | 10137584          |
| 0.3359        | 0.2707 | 190  | 1.1815          | 10406656          |
| 0.3616        | 0.2778 | 195  | 1.1803          | 10677608          |
| 0.3162        | 0.2849 | 200  | 1.1794          | 10948264          |
| 0.3174        | 0.2920 | 205  | 1.1750          | 11218000          |
| 0.2904        | 0.2992 | 210  | 1.1806          | 11498160          |
| 0.3929        | 0.3063 | 215  | 1.1692          | 11779608          |
| 0.2965        | 0.3134 | 220  | 1.1731          | 12049808          |
| 0.4205        | 0.3205 | 225  | 1.1692          | 12326136          |
| 0.2849        | 0.3277 | 230  | 1.1736          | 12596680          |
| 0.3107        | 0.3348 | 235  | 1.1665          | 12869960          |
| 0.2267        | 0.3419 | 240  | 1.1724          | 13145648          |
| 0.2392        | 0.3490 | 245  | 1.1708          | 13415312          |
| 0.1885        | 0.3562 | 250  | 1.1657          | 13690584          |
| 0.2722        | 0.3633 | 255  | 1.1676          | 13968448          |
| 0.2161        | 0.3704 | 260  | 1.1651          | 14239944          |
| 0.1734        | 0.3775 | 265  | 1.1659          | 14510952          |
| 0.3554        | 0.3846 | 270  | 1.1580          | 14780912          |
| 0.316         | 0.3918 | 275  | 1.1608          | 15055568          |
| 0.2742        | 0.3989 | 280  | 1.1562          | 15334424          |
| 0.1887        | 0.4060 | 285  | 1.1580          | 15606264          |
| 0.3007        | 0.4131 | 290  | 1.1570          | 15876168          |
| 0.1913        | 0.4203 | 295  | 1.1507          | 16146352          |
| 0.2763        | 0.4274 | 300  | 1.1523          | 16420864          |
| 0.3037        | 0.4345 | 305  | 1.1499          | 16693096          |
| 0.1839        | 0.4416 | 310  | 1.1526          | 16976408          |
| 0.2314        | 0.4488 | 315  | 1.1499          | 17252728          |
| 0.2425        | 0.4559 | 320  | 1.1526          | 17521216          |
| 0.2362        | 0.4630 | 325  | 1.1487          | 17788696          |
| 0.2139        | 0.4701 | 330  | 1.1502          | 18057744          |
| 0.2801        | 0.4773 | 335  | 1.1443          | 18332304          |
| 0.3707        | 0.4844 | 340  | 1.1458          | 18610592          |
| 0.2548        | 0.4915 | 345  | 1.1450          | 18881784          |
| 0.2455        | 0.4986 | 350  | 1.1418          | 19146128          |
| 0.2278        | 0.5057 | 355  | 1.1452          | 19420384          |
| 0.2771        | 0.5129 | 360  | 1.1420          | 19696584          |
| 0.2731        | 0.5200 | 365  | 1.1394          | 19967720          |
| 0.219         | 0.5271 | 370  | 1.1415          | 20241272          |
| 0.2432        | 0.5342 | 375  | 1.1457          | 20514896          |
| 0.1841        | 0.5414 | 380  | 1.1429          | 20779312          |
| 0.2617        | 0.5485 | 385  | 1.1404          | 21056016          |
| 0.2928        | 0.5556 | 390  | 1.1404          | 21327080          |
| 0.1952        | 0.5627 | 395  | 1.1354          | 21598992          |
| 0.227         | 0.5699 | 400  | 1.1381          | 21877208          |
| 0.2218        | 0.5770 | 405  | 1.1380          | 22149176          |
| 0.1683        | 0.5841 | 410  | 1.1375          | 22423056          |
| 0.3227        | 0.5912 | 415  | 1.1348          | 22693424          |
| 0.3058        | 0.5983 | 420  | 1.1357          | 22966920          |
| 0.1881        | 0.6055 | 425  | 1.1341          | 23246936          |
| 0.2359        | 0.6126 | 430  | 1.1314          | 23522192          |
| 0.2074        | 0.6197 | 435  | 1.1307          | 23801944          |
| 0.2584        | 0.6268 | 440  | 1.1328          | 24074328          |
| 0.2027        | 0.6340 | 445  | 1.1289          | 24348328          |
| 0.2897        | 0.6411 | 450  | 1.1305          | 24623816          |
| 0.2167        | 0.6482 | 455  | 1.1309          | 24902928          |
| 0.3028        | 0.6553 | 460  | 1.1306          | 25174984          |
| 0.2939        | 0.6625 | 465  | 1.1287          | 25447728          |
| 0.2679        | 0.6696 | 470  | 1.1262          | 25716008          |
| 0.3617        | 0.6767 | 475  | 1.1275          | 25994912          |
| 0.3261        | 0.6838 | 480  | 1.1266          | 26270048          |
| 0.2113        | 0.6909 | 485  | 1.1270          | 26541616          |
| 0.3059        | 0.6981 | 490  | 1.1287          | 26818200          |
| 0.2356        | 0.7052 | 495  | 1.1242          | 27087272          |
| 0.2931        | 0.7123 | 500  | 1.1246          | 27359208          |
| 0.2421        | 0.7194 | 505  | 1.1233          | 27638688          |
| 0.2792        | 0.7266 | 510  | 1.1252          | 27911800          |
| 0.2415        | 0.7337 | 515  | 1.1214          | 28186904          |
| 0.292         | 0.7408 | 520  | 1.1222          | 28462520          |
| 0.2697        | 0.7479 | 525  | 1.1214          | 28740360          |
| 0.2745        | 0.7551 | 530  | 1.1196          | 29013592          |
| 0.2365        | 0.7622 | 535  | 1.1221          | 29285096          |
| 0.2456        | 0.7693 | 540  | 1.1199          | 29557536          |
| 0.2182        | 0.7764 | 545  | 1.1208          | 29835096          |
| 0.3136        | 0.7835 | 550  | 1.1219          | 30112088          |
| 0.184         | 0.7907 | 555  | 1.1167          | 30387312          |
| 0.2508        | 0.7978 | 560  | 1.1200          | 30659104          |
| 0.2854        | 0.8049 | 565  | 1.1208          | 30939024          |
| 0.2423        | 0.8120 | 570  | 1.1186          | 31214856          |
| 0.3061        | 0.8192 | 575  | 1.1174          | 31487176          |
| 0.2599        | 0.8263 | 580  | 1.1176          | 31758936          |
| 0.1641        | 0.8334 | 585  | 1.1192          | 32029768          |
| 0.3293        | 0.8405 | 590  | 1.1180          | 32306824          |
| 0.1687        | 0.8477 | 595  | 1.1187          | 32583424          |
| 0.2466        | 0.8548 | 600  | 1.1157          | 32855528          |
| 0.2684        | 0.8619 | 605  | 1.1151          | 33131344          |
| 0.2623        | 0.8690 | 610  | 1.1156          | 33412888          |
| 0.3949        | 0.8761 | 615  | 1.1167          | 33688992          |
| 0.2317        | 0.8833 | 620  | 1.1167          | 33963096          |
| 0.2483        | 0.8904 | 625  | 1.1147          | 34243336          |
| 0.3731        | 0.8975 | 630  | 1.1142          | 34521472          |
| 0.2577        | 0.9046 | 635  | 1.1143          | 34794832          |
| 0.2225        | 0.9118 | 640  | 1.1139          | 35064072          |
| 0.1567        | 0.9189 | 645  | 1.1146          | 35342008          |
| 0.3207        | 0.9260 | 650  | 1.1146          | 35610720          |
| 0.1626        | 0.9331 | 655  | 1.1153          | 35880752          |
| 0.2122        | 0.9403 | 660  | 1.1138          | 36156864          |
| 0.2865        | 0.9474 | 665  | 1.1110          | 36433816          |
| 0.2319        | 0.9545 | 670  | 1.1134          | 36713952          |
| 0.1696        | 0.9616 | 675  | 1.1129          | 36980552          |
| 0.2326        | 0.9687 | 680  | 1.1120          | 37256536          |
| 0.2783        | 0.9759 | 685  | 1.1133          | 37524184          |
| 0.2046        | 0.9830 | 690  | 1.1113          | 37805352          |
| 0.2798        | 0.9901 | 695  | 1.1119          | 38079104          |
| 0.2794        | 0.9972 | 700  | 1.1159          | 38340280          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1