tdoehmen's picture
Update README.md
e898483 verified
metadata
license: cc-by-4.0

SchemaPile Foreign Key Detection Model (Starcoder)

Model Description

In this repository we are introducing starcoder-schemapile-fk. It's a language model, based on BigCode/starcoder fine-tuned for predicting foreign key relationships in relational database schemas.

Training Data

Forein key pairs extracted from SchemaPile-Perm, a large collection of relational database schemas.

Evaluation Data

We evaluate the foreign key detection accuracy of starcoder-schemapile-fk and t5-schemapile-fk on schemas from Spider, BIRD-SQL, and CTU PRLR.

eval

Training Procedure

The model was trained, using 4x A100 40GB GPUs with DeepSpeed ZeRO-3 offloading, and following hyperparamters:

  • learning_rate: 2.0e-05
  • num_train_epochs: 3
  • gradient_accumulation_steps: 8
  • per_device_train_batch_size: 4
  • bf16: true
  • warmup_ratio: 0.03
  • weight_decay: 0.0

See Training Code.

How to Use

We recommend using the provided prompt template and constrained output using jsonformer:

Example Prompt:

You are given the following SQL database tables: 
staff(staff_id, staff_address_id, nickname, first_name, middle_name, last_name, date_of_birth, date_joined_staff, date_left_staff)
addresses(address_id, line_1_number_building, city, zip_postcode, state_province_county, country)
Output a json string with the following schema {table, column, referencedTable, referencedColumn} that contains the foreign key relationship between the two tables.

Example Output:

{'table': 'staff',
 'column': 'staff_address_id',
 'referencedTable': 'addresses',
 'referencedColumn': 'address_id'}

To run the model locally, we recommend using our end-to-end Example Notebook (requires a single A100 40GB).