collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1114
Num Input Tokens Seen: 30129080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6587	0.0092	5	1.3863	276696
1.5356	0.0185	10	1.3193	550488
1.4519	0.0277	15	1.2505	831264
1.3766	0.0369	20	1.1950	1107760
1.3159	0.0461	25	1.1635	1389896
1.1803	0.0554	30	1.1441	1668904
1.1339	0.0646	35	1.1371	1949864
0.9903	0.0738	40	1.1480	2233824
1.0179	0.0830	45	1.1589	2509936
0.9916	0.0923	50	1.1672	2793040
0.9015	0.1015	55	1.1839	3068680
0.8562	0.1107	60	1.1852	3347056
0.8485	0.1200	65	1.1998	3627512
0.7508	0.1292	70	1.2026	3905976
0.7357	0.1384	75	1.2045	4179328
0.6496	0.1476	80	1.1934	4453160
0.7891	0.1569	85	1.1950	4735096
0.5708	0.1661	90	1.1959	5015384
0.607	0.1753	95	1.2026	5284160
0.5427	0.1845	100	1.1955	5555648
0.4434	0.1938	105	1.1935	5839504
0.4716	0.2030	110	1.1997	6113904
0.5612	0.2122	115	1.1869	6394080
0.5522	0.2215	120	1.1934	6668968
0.4752	0.2307	125	1.1917	6943216
0.3948	0.2399	130	1.1873	7224944
0.4525	0.2491	135	1.1890	7499080
0.5147	0.2584	140	1.1814	7773104
0.4881	0.2676	145	1.1917	8050400
0.3915	0.2768	150	1.1842	8332168
0.4032	0.2860	155	1.1897	8608296
0.4227	0.2953	160	1.1804	8887936
0.4128	0.3045	165	1.1838	9164856
0.4097	0.3137	170	1.1759	9448376
0.3663	0.3230	175	1.1841	9721256
0.4311	0.3322	180	1.1780	9999808
0.3765	0.3414	185	1.1763	10273504
0.4953	0.3506	190	1.1663	10554248
0.3491	0.3599	195	1.1760	10835664
0.5705	0.3691	200	1.1670	11117936
0.3433	0.3783	205	1.1677	11394272
0.366	0.3875	210	1.1675	11667112
0.3678	0.3968	215	1.1643	11940344
0.3999	0.4060	220	1.1664	12226416
0.2779	0.4152	225	1.1623	12509896
0.2937	0.4245	230	1.1625	12789696
0.3232	0.4337	235	1.1577	13067376
0.2727	0.4429	240	1.1603	13347168
0.4066	0.4521	245	1.1549	13623832
0.3169	0.4614	250	1.1554	13902696
0.3345	0.4706	255	1.1557	14188000
0.3015	0.4798	260	1.1543	14470712
0.3465	0.4890	265	1.1519	14746408
0.3225	0.4983	270	1.1479	15018640
0.2737	0.5075	275	1.1483	15296928
0.3426	0.5167	280	1.1429	15574408
0.3332	0.5260	285	1.1446	15847000
0.2775	0.5352	290	1.1413	16126256
0.3818	0.5444	295	1.1398	16403872
0.402	0.5536	300	1.1409	16683200
0.3527	0.5629	305	1.1387	16957856
0.3747	0.5721	310	1.1381	17237088
0.2767	0.5813	315	1.1398	17514672
0.397	0.5905	320	1.1353	17790912
0.2713	0.5998	325	1.1355	18067224
0.3836	0.6090	330	1.1335	18345448
0.2953	0.6182	335	1.1340	18625288
0.3032	0.6275	340	1.1339	18895360
0.3337	0.6367	345	1.1315	19176592
0.2324	0.6459	350	1.1368	19456384
0.3954	0.6551	355	1.1290	19736048
0.3867	0.6644	360	1.1316	20017992
0.2376	0.6736	365	1.1317	20299128
0.2497	0.6828	370	1.1302	20572064
0.2433	0.6920	375	1.1295	20847344
0.3257	0.7013	380	1.1262	21131912
0.3596	0.7105	385	1.1299	21410128
0.3307	0.7197	390	1.1261	21691144
0.3911	0.7290	395	1.1277	21972080
0.3247	0.7382	400	1.1245	22254672
0.3654	0.7474	405	1.1262	22539544
0.2657	0.7566	410	1.1235	22820048
0.3721	0.7659	415	1.1242	23096928
0.2776	0.7751	420	1.1227	23369624
0.2669	0.7843	425	1.1249	23652232
0.3584	0.7935	430	1.1227	23931024
0.4058	0.8028	435	1.1194	24211728
0.271	0.8120	440	1.1246	24490376
0.2958	0.8212	445	1.1206	24772424
0.2507	0.8304	450	1.1214	25054744
0.3209	0.8397	455	1.1193	25331320
0.2983	0.8489	460	1.1173	25606720
0.302	0.8581	465	1.1181	25890600
0.4136	0.8674	470	1.1165	26167160
0.3069	0.8766	475	1.1179	26448160
0.2351	0.8858	480	1.1173	26723544
0.2373	0.8950	485	1.1175	27006408
0.3894	0.9043	490	1.1146	27281088
0.277	0.9135	495	1.1174	27562296
0.3009	0.9227	500	1.1151	27833952
0.3229	0.9319	505	1.1139	28106704
0.2891	0.9412	510	1.1161	28385768
0.2745	0.9504	515	1.1128	28670136
0.3377	0.9596	520	1.1158	28953688
0.3045	0.9689	525	1.1126	29230304
0.2475	0.9781	530	1.1150	29509224
0.2633	0.9873	535	1.1121	29791512
0.2622	0.9965	540	1.1105	30074936

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

Evaluation results