SujanKarki
/

Meta_Llama3.1_8b_instruct_text_to_sql_vera

Model card Files Files and versions Community

Meta_Llama3.1_8b_instruct_text_to_sql_vera / README.md

SujanKarki's picture

Update README.md

e5321d8 verified 2 months ago

|

3.97 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	library_name: peft
	---


	## Training Details

	### Training Data

	gretelai/synthetic_text_to_sql
	https://huggingface.co/datasets/gretelai/synthetic_text_to_sql
	gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples. The dataset includes 105,851 records partitioned into 100,000 train and 5,851 test records. But i used only 50k records for my training.
	### Training Result


	Step Training Loss

	10 1.296000

	20 1.331600

	30 1.279400

	40 1.312900

	50 1.274100

	60 1.271700

	70 1.209100

	80 1.192600

	90 1.176700

	100 1.118300

	110 1.086800

	120 1.048000

	130 1.019500

	140 1.001400

	150 0.994300

	160 0.934900

	170 0.904500

	180 0.879900

	190 0.850400

	200 0.828000

	210 0.811400

	220 0.846000

	230 0.791100

	240 0.766900

	250 0.782000

	260 0.718300

	270 0.701800

	280 0.720000

	290 0.693600

	300 0.676500

	310 0.679900

	320 0.673200

	330 0.669500

	340 0.692800

	350 0.662200

	360 0.761200

	370 0.659600

	380 0.683700

	390 0.681200

	400 0.674000

	410 0.651800

	420 0.641800

	430 0.646500

	440 0.664200

	450 0.633600

	460 0.646900

	470 0.643400

	480 0.658800

	490 0.631500

	500 0.678200

	510 0.633400

	520 0.623300

	530 0.655700

	540 0.631500

	550 0.617700

	560 0.644000

	570 0.650200

	580 0.618500

	590 0.615400

	600 0.614000

	610 0.612800

	620 0.616900

	630 0.640200

	640 0.613000

	650 0.611400

	660 0.617000

	670 0.629800

	680 0.648800

	690 0.608800

	700 0.603200

	710 0.628200

	720 0.629700

	730 0.604400

	740 0.610700

	750 0.621300

	760 0.617900

	770 0.596500

	780 0.612800

	790 0.611700

	800 0.618600

	810 0.590900

	820 0.590300

	830 0.592900

	840 0.611700

	850 0.628300

	860 0.590100

	870 0.584800

	880 0.591200

	890 0.585900

	900 0.607000

	910 0.578800

	920 0.576600

	930 0.597600

	940 0.602100

	950 0.579000

	960 0.597900

	970 0.590600

	980 0.606100

	990 0.577600

	1000 0.584000

	1010 0.569300

	1020 0.594000

	1030 0.596100

	1040 0.590600

	1050 0.570300

	1060 0.572800

	1070 0.572200

	1080 0.569900

	1090 0.587200

	1100 0.572200

	1110 0.569700

	1120 0.612500

	1130 0.587800

	1140 0.568100

	1150 0.573100

	1160 0.568300

	1170 0.620800

	1180 0.570600

	1190 0.561500

	1200 0.560200

	1210 0.592400

	1220 0.580500

	1230 0.578300

	1240 0.573400

	1250 0.568800

	1260 0.600500

	1270 0.578800

	1280 0.561300

	1290 0.570900

	1300 0.567700

	1310 0.589800

	1320 0.598200

	1330 0.564900

	1340 0.577500

	1350 0.565700

	1360 0.581400

	1370 0.562000

	1380 0.588200

	1390 0.603800

	1400 0.560300

	1410 0.559600

	1420 0.567000

	1430 0.562700

	1440 0.564200

	1450 0.563700

	1460 0.561100

	1470 0.561100

	1480 0.561600

	1490 0.564800

	1500 0.579100

	1510 0.564100

	1520 0.562900

	1530 0.569800

	1540 0.566200

	1550 0.599100

	1560 0.562000

	1570 0.580600

	1580 0.564900

	1590 0.571900

	1600 0.580000

	1610 0.559200

	1620 0.566900

	1630 0.556100


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66465899a15e2eb8fd53727d/UNamiG8HciSUBxfS2erbv.png)

	#### Training Hyperparameters

	The following hyperparameters were used during training:
	num_train_epochs=3,
	per_device_train_batch_size=2,
	gradient_accumulation_steps=4,
	optim="adamw_torch_fused",
	learning_rate=2e-4,
	max_grad_norm=0.3,
	weight_decay=0.01,
	lr_scheduler_type="cosine",
	warmup_steps=50,
	bf16=True,
	tf32=True,
	)