base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
Training Details
Training Data
gretelai/synthetic_text_to_sql https://huggingface.co/datasets/gretelai/synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples. The dataset includes 105,851 records partitioned into 100,000 train and 5,851 test records. But i used only 50k records for my training.
Training Result
Step Training Loss
10 1.296000
20 1.331600
30 1.279400
40 1.312900
50 1.274100
60 1.271700
70 1.209100
80 1.192600
90 1.176700
100 1.118300
110 1.086800
120 1.048000
130 1.019500
140 1.001400
150 0.994300
160 0.934900
170 0.904500
180 0.879900
190 0.850400
200 0.828000
210 0.811400
220 0.846000
230 0.791100
240 0.766900
250 0.782000
260 0.718300
270 0.701800
280 0.720000
290 0.693600
300 0.676500
310 0.679900
320 0.673200
330 0.669500
340 0.692800
350 0.662200
360 0.761200
370 0.659600
380 0.683700
390 0.681200
400 0.674000
410 0.651800
420 0.641800
430 0.646500
440 0.664200
450 0.633600
460 0.646900
470 0.643400
480 0.658800
490 0.631500
500 0.678200
510 0.633400
520 0.623300
530 0.655700
540 0.631500
550 0.617700
560 0.644000
570 0.650200
580 0.618500
590 0.615400
600 0.614000
610 0.612800
620 0.616900
630 0.640200
640 0.613000
650 0.611400
660 0.617000
670 0.629800
680 0.648800
690 0.608800
700 0.603200
710 0.628200
720 0.629700
730 0.604400
740 0.610700
750 0.621300
760 0.617900
770 0.596500
780 0.612800
790 0.611700
800 0.618600
810 0.590900
820 0.590300
830 0.592900
840 0.611700
850 0.628300
860 0.590100
870 0.584800
880 0.591200
890 0.585900
900 0.607000
910 0.578800
920 0.576600
930 0.597600
940 0.602100
950 0.579000
960 0.597900
970 0.590600
980 0.606100
990 0.577600
1000 0.584000
1010 0.569300
1020 0.594000
1030 0.596100
1040 0.590600
1050 0.570300
1060 0.572800
1070 0.572200
1080 0.569900
1090 0.587200
1100 0.572200
1110 0.569700
1120 0.612500
1130 0.587800
1140 0.568100
1150 0.573100
1160 0.568300
1170 0.620800
1180 0.570600
1190 0.561500
1200 0.560200
1210 0.592400
1220 0.580500
1230 0.578300
1240 0.573400
1250 0.568800
1260 0.600500
1270 0.578800
1280 0.561300
1290 0.570900
1300 0.567700
1310 0.589800
1320 0.598200
1330 0.564900
1340 0.577500
1350 0.565700
1360 0.581400
1370 0.562000
1380 0.588200
1390 0.603800
1400 0.560300
1410 0.559600
1420 0.567000
1430 0.562700
1440 0.564200
1450 0.563700
1460 0.561100
1470 0.561100
1480 0.561600
1490 0.564800
1500 0.579100
1510 0.564100
1520 0.562900
1530 0.569800
1540 0.566200
1550 0.599100
1560 0.562000
1570 0.580600
1580 0.564900
1590 0.571900
1600 0.580000
1610 0.559200
1620 0.566900
1630 0.556100
Training Hyperparameters
The following hyperparameters were used during training:
num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
optim="adamw_torch_fused",
learning_rate=2e-4,
max_grad_norm=0.3,
weight_decay=0.01,
lr_scheduler_type="cosine",
warmup_steps=50,
bf16=True,
tf32=True,
)