File size: 2,936 Bytes
3b54ba5 20ece27 3b54ba5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
base_model: deepseek-ai/deepseek-coder-6.7b-instruct
tags:
- SOLAR
- instruct
- finetune
model-index:
- name: NaturalQuery-Solar-6.7B-v0.1
results: []
license: apache-2.0
language:
- en
datasets:
- wikisql
---
# **NaturalQuery-Solar-6.7B-v0.1**
**NaturalQuery** is a LLM that can translate natural language queries to SQL based on your schema.
NaturalQuery-v0.1 is finetuned on 8k text to PostgreSQL Natural Language <> SQL pairs.
**Future Improvements**:
- Much larger training set
- More complex schemas, questions, and queries
- Reward modeling via DPO
- Benchmarking
# **Usage**
Make sure you have the correct version of the transformers library installed:
```sh
pip install transformers==4.35.2
```
### **Loading the Model**
Use the following Python code to load the model:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("cfahlgren1/NaturalSQL-6.7B-v0")
model = AutoModelForCausalLM.from_pretrained(
"cfahlgren1/NaturalSQL-6.7B-v0",
device_map="auto",
torch_dtype=torch.float16,
)
```
### **Generating Text**
To generate text, use the following Python code:
```python
messages=[
{ 'role': 'user', 'content': prompt}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# 32021 is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
```
# **SQL Generation Template**
```
### Task
Generate a SQL query to answer the following question: `{natural language question}`
### Database Schema
The query will run on a database with the following schema:
'''
<SQL Table DDL Statements>
'''
### Answer
Here is the SQL query that answers the question: `{natural language question}`
'''sql
```
# **Example SQL Output**
### **Example Schemas**
```sql
CREATE TABLE
table_1_11545282_6 (
"No." numeric,
Nationality text,
"Years for Jazz" text
);
CREATE TABLE
table_2_17383560_1 (
Pick numeric,
Round numeric,
Player text,
"School/Club Team" text,
Position text
);
CREATE TABLE
table_1_10581768_2 (
Institution text,
Enrollment numeric,
Nickname text,
Founded numeric
);
```
**Question**: **What is the round of pick 63?**
```sql
SELECT "Round" FROM table_2_17383560_1 WHERE Pick=63;
```
**Question**: **What is the most popular position among players?**
```sql
SELECT COUNT("Position") FROM "table_2_17383560_1" GROUP BY "Position" ORDER BY COUNT("Position") DESC LIMIT 1;
```
**Question**: **What is the most recent year an institution was founded?**
```sql
SELECT MAX("Founded") FROM table_1_10581768_2;
``` |