myBit-Llama2-jp-127M-3

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 13.0221

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
7.8184	1.25	10	8.3355
5.4327	2.5	20	7.6000
5.0861	3.75	30	7.8126
4.7586	5.0	40	7.5748
4.4392	6.25	50	7.4509
4.1938	7.5	60	7.3834
4.0095	8.75	70	7.2750
3.905	10.0	80	7.3800
3.6536	11.25	90	7.4560
3.3187	12.5	100	7.6310
3.3315	13.75	110	8.0397
2.9308	15.0	120	8.3902
2.679	16.25	130	9.0364
2.2896	17.5	140	9.8766
1.8407	18.75	150	10.7682
1.5081	20.0	160	11.7175
0.9778	21.25	170	12.8239
0.6572	22.5	180	13.6506
0.5411	23.75	190	14.2579
0.44	25.0	200	14.5732
0.3283	26.25	210	15.1087
0.2507	27.5	220	15.0569
0.2044	28.75	230	15.1893
0.1838	30.0	240	15.6291
0.1626	31.25	250	15.4617
0.1124	32.5	260	15.2738
0.1011	33.75	270	15.2130
0.0845	35.0	280	15.2749
0.0852	36.25	290	15.3292
0.1025	37.5	300	15.1574
0.1075	38.75	310	15.1100
0.079	40.0	320	14.8177
0.0857	41.25	330	14.8609
0.0629	42.5	340	14.6443
0.0713	43.75	350	14.5514
0.0594	45.0	360	14.6032
0.0557	46.25	370	14.3489
0.0554	47.5	380	14.3289
0.0548	48.75	390	14.1991
0.0528	50.0	400	14.1350
0.0515	51.25	410	13.9952
0.0529	52.5	420	13.9788
0.0516	53.75	430	13.9438
0.0506	55.0	440	13.8746
0.049	56.25	450	13.7564
0.0491	57.5	460	13.7900
0.0493	58.75	470	13.6992
0.0491	60.0	480	13.6421
0.0497	61.25	490	13.6419
0.0489	62.5	500	13.5448
0.0504	63.75	510	13.5048
0.0508	65.0	520	13.5077
0.0488	66.25	530	13.5045
0.0485	67.5	540	13.4404
0.0493	68.75	550	13.4167
0.0507	70.0	560	13.3758
0.0491	71.25	570	13.3239
0.0484	72.5	580	13.3139
0.0472	73.75	590	13.2933
0.0493	75.0	600	13.3105
0.0475	76.25	610	13.2306
0.0465	77.5	620	13.2378
0.0474	78.75	630	13.2074
0.0468	80.0	640	13.1871
0.0466	81.25	650	13.2055
0.0459	82.5	660	13.1327
0.0466	83.75	670	13.1801
0.0485	85.0	680	13.1610
0.046	86.25	690	13.1439
0.0467	87.5	700	13.1114
0.0455	88.75	710	13.1123
0.0456	90.0	720	13.0635
0.0447	91.25	730	13.0997
0.0449	92.5	740	13.0704
0.0453	93.75	750	13.0531
0.0451	95.0	760	13.0432
0.0442	96.25	770	13.0311
0.0444	97.5	780	13.0329
0.0432	98.75	790	13.0491
0.0442	100.0	800	13.0221

Framework versions

Transformers 4.39.1
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

Inc44
/

myBit-Llama2-jp-127M-3

myBit-Llama2-jp-127M-3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results