license: cc-by-4.0
SchemaPile Foreign Key Detection Model (Starcoder)
Model Description
In this repository we are introducing starcoder-schemapile-fk. It's a language model, based on BigCode/starcoder fine-tuned for predicting foreign key relationships in relational database schemas.
Training Data
Forein key pairs extracted from SchemaPile-Perm, a large collection of relational database schemas.
Evaluation Data
We evaluate the foreign key detection accuracy of starcoder-schemapile-fk and t5-schemapile-fk on schemas from Spider, BIRD-SQL, and CTU PRLR.
Training Procedure
The model was trained, using 4x A100 40GB GPUs with DeepSpeed ZeRO-3 offloading, and following hyperparamters:
- learning_rate: 2.0e-05
- num_train_epochs: 3
- gradient_accumulation_steps: 8
- per_device_train_batch_size: 4
- bf16: true
- warmup_ratio: 0.03
- weight_decay: 0.0
See Training Code.
How to Use
We recommend using the provided prompt template and constrained output using jsonformer:
Example Prompt:
You are given the following SQL database tables:
staff(staff_id, staff_address_id, nickname, first_name, middle_name, last_name, date_of_birth, date_joined_staff, date_left_staff)
addresses(address_id, line_1_number_building, city, zip_postcode, state_province_county, country)
Output a json string with the following schema {table, column, referencedTable, referencedColumn} that contains the foreign key relationship between the two tables.
Example Output:
{'table': 'staff',
'column': 'staff_address_id',
'referencedTable': 'addresses',
'referencedColumn': 'address_id'}
To run the model locally, we recommend using our end-to-end Example Notebook (requires a single A100 40GB).