SujanKarki's picture
Update README.md
1fcccde verified
|
raw
history blame
3.64 kB
metadata
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft

Training Details

Training Data

gretelai/synthetic_text_to_sql https://huggingface.co/datasets/gretelai/synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples. The dataset includes 105,851 records partitioned into 100,000 train and 5,851 test records. But i used only 50k records for my training.

Training Result

Step Training Loss

10 1.296000

20 1.331600

30 1.279400

40 1.312900

50 1.274100

60 1.271700

70 1.209100

80 1.192600

90 1.176700 100 1.118300 110 1.086800 120 1.048000 130 1.019500 140 1.001400 150 0.994300 160 0.934900 170 0.904500 180 0.879900 190 0.850400 200 0.828000 210 0.811400 220 0.846000 230 0.791100 240 0.766900 250 0.782000 260 0.718300 270 0.701800 280 0.720000 290 0.693600 300 0.676500 310 0.679900 320 0.673200 330 0.669500 340 0.692800 350 0.662200 360 0.761200 370 0.659600 380 0.683700 390 0.681200 400 0.674000 410 0.651800 420 0.641800 430 0.646500 440 0.664200 450 0.633600 460 0.646900 470 0.643400 480 0.658800 490 0.631500 500 0.678200 510 0.633400 520 0.623300 530 0.655700 540 0.631500 550 0.617700 560 0.644000 570 0.650200 580 0.618500 590 0.615400 600 0.614000 610 0.612800 620 0.616900 630 0.640200 640 0.613000 650 0.611400 660 0.617000 670 0.629800 680 0.648800 690 0.608800 700 0.603200 710 0.628200 720 0.629700 730 0.604400 740 0.610700 750 0.621300 760 0.617900 770 0.596500 780 0.612800 790 0.611700 800 0.618600 810 0.590900 820 0.590300 830 0.592900 840 0.611700 850 0.628300 860 0.590100 870 0.584800 880 0.591200 890 0.585900 900 0.607000 910 0.578800 920 0.576600 930 0.597600 940 0.602100 950 0.579000 960 0.597900 970 0.590600 980 0.606100 990 0.577600 1000 0.584000 1010 0.569300 1020 0.594000 1030 0.596100 1040 0.590600 1050 0.570300 1060 0.572800 1070 0.572200 1080 0.569900 1090 0.587200 1100 0.572200 1110 0.569700 1120 0.612500 1130 0.587800 1140 0.568100 1150 0.573100 1160 0.568300 1170 0.620800 1180 0.570600 1190 0.561500 1200 0.560200 1210 0.592400 1220 0.580500 1230 0.578300 1240 0.573400 1250 0.568800 1260 0.600500 1270 0.578800 1280 0.561300 1290 0.570900 1300 0.567700 1310 0.589800 1320 0.598200 1330 0.564900 1340 0.577500 1350 0.565700 1360 0.581400 1370 0.562000 1380 0.588200 1390 0.603800 1400 0.560300 1410 0.559600 1420 0.567000 1430 0.562700 1440 0.564200 1450 0.563700 1460 0.561100 1470 0.561100 1480 0.561600 1490 0.564800 1500 0.579100 1510 0.564100 1520 0.562900 1530 0.569800 1540 0.566200 1550 0.599100 1560 0.562000 1570 0.580600 1580 0.564900 1590 0.571900 1600 0.580000 1610 0.559200 1620 0.566900 1630 0.556100

image/png

Training Hyperparameters

The following hyperparameters were used during training: num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
optim="adamw_torch_fused",
learning_rate=2e-4,
max_grad_norm=0.3,
weight_decay=0.01,
lr_scheduler_type="cosine",
warmup_steps=50, bf16=True,
tf32=True, )