cfahlgren1 HF staff commited on
Commit
3b54ba5
1 Parent(s): de7446f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
3
+ tags:
4
+ - SOLAR
5
+ - instruct
6
+ - finetune
7
+ model-index:
8
+ - name: NaturalQuery-Solar-6.7B-v0.1
9
+ results: []
10
+ license: apache-2.0
11
+ language:
12
+ - en
13
+ datasets:
14
+ - wikisql
15
+ ---
16
+
17
+ # **NaturalQuery-Solar-6.7B-v0.1**
18
+
19
+ **NaturalQuery** is a LLM that can translate natural language queries to SQL based on your schema.
20
+
21
+ NaturalQuery-v0.1 is finetuned on 8k text to PostgreSQL Natural Language <> SQL pairs.
22
+
23
+ **Future Improvements**:
24
+
25
+ - Much larger training set
26
+ - More complex schemas, questions, and queries
27
+ - Reward modeling via DPO
28
+ - Benchmarking
29
+
30
+ # **Usage**
31
+
32
+ Make sure you have the correct version of the transformers library installed:
33
+
34
+ ```sh
35
+ pip install transformers==4.35.2
36
+ ```
37
+
38
+ ### **Loading the Model**
39
+
40
+ Use the following Python code to load the model:
41
+
42
+ ```python
43
+ import torch
44
+ from transformers import AutoModelForCausalLM, AutoTokenizer
45
+ tokenizer = AutoTokenizer.from_pretrained("cfahlgren1/NaturalSQL-6.7B-v0")
46
+ model = AutoModelForCausalLM.from_pretrained(
47
+ "cfahlgren1/NaturalSQL-6.7B-v0",
48
+ device_map="auto",
49
+ torch_dtype=torch.float16,
50
+ )
51
+ ```
52
+
53
+ ### **Generating Text**
54
+
55
+ To generate text, use the following Python code:
56
+
57
+ ```python
58
+ text = "Hi, my name is "
59
+ inputs = tokenizer(text, return_tensors="pt")
60
+ outputs = model.generate(**inputs, max_new_tokens=64)
61
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
62
+ ```
63
+
64
+
65
+ # **SQL Generation Template**
66
+
67
+ ```
68
+ ### Task
69
+
70
+ Generate a SQL query to answer the following question: `{natural language question}`
71
+
72
+ ### Database Schema
73
+
74
+ The query will run on a database with the following schema:
75
+
76
+ '''
77
+ <SQL Table DDL Statements>
78
+ '''
79
+
80
+ ### Answer
81
+ Here is the SQL query that answers the question: `{natural language question}`
82
+ '''sql
83
+ ```
84
+
85
+ # **Example SQL Output**
86
+
87
+ ### **Example Schemas**
88
+
89
+ ```sql
90
+ CREATE TABLE
91
+ table_1_11545282_6 (
92
+ "No." numeric,
93
+ Nationality text,
94
+ "Years for Jazz" text
95
+ );
96
+
97
+ CREATE TABLE
98
+ table_2_17383560_1 (
99
+ Pick numeric,
100
+ Round numeric,
101
+ Player text,
102
+ "School/Club Team" text,
103
+ Position text
104
+ );
105
+
106
+ CREATE TABLE
107
+ table_1_10581768_2 (
108
+ Institution text,
109
+ Enrollment numeric,
110
+ Nickname text,
111
+ Founded numeric
112
+ );
113
+ ```
114
+
115
+ **Question**: **What is the round of pick 63?**
116
+ ```sql
117
+ SELECT "Round" FROM table_2_17383560_1 WHERE Pick=63;
118
+ ```
119
+ **Question**: **What is the most popular position among players?**
120
+ ```sql
121
+ SELECT COUNT("Position") FROM "table_2_17383560_1" GROUP BY "Position" ORDER BY COUNT("Position") DESC LIMIT 1;
122
+ ```
123
+
124
+ **Question**: **What is the most recent year an institution was founded?**
125
+ ```sql
126
+ SELECT MAX("Founded") FROM table_1_10581768_2;
127
+ ```