|
--- |
|
license: cc-by-4.0 |
|
--- |
|
# SchemaPile Foreign Key Detection Model (Starcoder) |
|
|
|
## Model Description |
|
|
|
In this repository we are introducing **starcoder-schemapile-fk**. It's a language model, based on [BigCode/starcoder](https://huggingface.co/bigcode/starcoder) fine-tuned for predicting foreign key relationships in relational database schemas. |
|
|
|
## Training Data |
|
|
|
Forein key pairs extracted from [SchemaPile-Perm](https://schemapile.github.io), a large collection of relational database schemas. |
|
|
|
## Evaluation Data |
|
|
|
We evaluate the foreign key detection accuracy of [starcoder-schemapile-fk](https://huggingface.co/tdoehmen/starcoder-schemapile-fk) and [t5-schemapile-fk](https://huggingface.co/tdoehmen/t5-schemapile-fk) on schemas from [Spider](https://yale-lily.github.io/spider), [BIRD-SQL](https://bird-bench.github.io/), and [CTU PRLR](https://arxiv.org/abs/1511.03086). |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/616ea71919594606318887e9/6ouh4u6PFQlY8prLrAm4l.png" alt="eval" width="400"/> |
|
|
|
## Training Procedure |
|
|
|
The model was trained, using 4x A100 40GB GPUs with DeepSpeed ZeRO-3 offloading, and following hyperparamters: |
|
|
|
- learning_rate: 2.0e-05 |
|
- num_train_epochs: 3 |
|
- gradient_accumulation_steps: 8 |
|
- per_device_train_batch_size: 4 |
|
- bf16: true |
|
- warmup_ratio: 0.03 |
|
- weight_decay: 0.0 |
|
|
|
See [Training Code](https://github.com/amsterdata/schemapile/tree/main/experiments/foreign_key_detection/starcoder_finetune). |
|
|
|
## How to Use |
|
|
|
We recommend using the provided prompt template and constrained output using jsonformer: |
|
|
|
Example Prompt: |
|
``` |
|
You are given the following SQL database tables: |
|
staff(staff_id, staff_address_id, nickname, first_name, middle_name, last_name, date_of_birth, date_joined_staff, date_left_staff) |
|
addresses(address_id, line_1_number_building, city, zip_postcode, state_province_county, country) |
|
Output a json string with the following schema {table, column, referencedTable, referencedColumn} that contains the foreign key relationship between the two tables. |
|
``` |
|
|
|
Example Output: |
|
``` |
|
{'table': 'staff', |
|
'column': 'staff_address_id', |
|
'referencedTable': 'addresses', |
|
'referencedColumn': 'address_id'} |
|
``` |
|
|
|
To run the model locally, we recommend using our end-to-end [Example Notebook](https://github.com/amsterdata/schemapile/blob/main/experiments/foreign_key_detection/starcoder-schemapile-fk-example.ipynb) (requires a single A100 40GB). |
|
|