Update README.md
Browse filesAdded more details.
README.md
CHANGED
@@ -11,4 +11,60 @@ base_model:
|
|
11 |
library_name: mlx
|
12 |
tags:
|
13 |
- legal
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
library_name: mlx
|
12 |
tags:
|
13 |
- legal
|
14 |
+
widget:
|
15 |
+
- text: |
|
16 |
+
<start_of_turn>user
|
17 |
+
## Instructions
|
18 |
+
You are a helpful AI assistant.
|
19 |
+
## User
|
20 |
+
How to make scrambled eggs?<end_of_turn>
|
21 |
+
<start_of_turn>model
|
22 |
+
---
|
23 |
+
# shellzero/gemma2-2b-ft-law-data-tag-generation
|
24 |
+
This model was converted to MLX format from [`google/gemma-7b-it`]().
|
25 |
+
Refer to the [original model card](https://huggingface.co/google/gemma-7b-it) for more details on the model.
|
26 |
+
|
27 |
+
```zsh
|
28 |
+
pip install mlx-lm
|
29 |
+
```
|
30 |
+
|
31 |
+
The model was LoRA fine-tuned on the [ymoslem/Law-StackExchange](https://huggingface.co/datasets/ymoslem/Law-StackExchange) and Synthetic data generated from
|
32 |
+
GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using `mlx`.
|
33 |
+
|
34 |
+
This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon)
|
35 |
+
|
36 |
+
```python
|
37 |
+
def format_prompt(system_prompt: str, title: str, question: str) -> str:
|
38 |
+
"Format the question to the format of the dataset we fine-tuned to."
|
39 |
+
return """<bos><start_of_turn>user
|
40 |
+
## Instructions
|
41 |
+
{}
|
42 |
+
## User
|
43 |
+
TITLE:
|
44 |
+
{}
|
45 |
+
QUESTION:
|
46 |
+
{}<end_of_turn>
|
47 |
+
<start_of_turn>model
|
48 |
+
""".format(
|
49 |
+
system_prompt, title, question
|
50 |
+
)
|
51 |
+
```
|
52 |
+
|
53 |
+
Here's an example of the system_prompt from the dataset:
|
54 |
+
```text
|
55 |
+
Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas.
|
56 |
+
```
|
57 |
+
## Loading the model using `mlx_lm`
|
58 |
+
|
59 |
+
```python
|
60 |
+
from mlx_lm import generate, load
|
61 |
+
model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation")
|
62 |
+
response = generate(
|
63 |
+
model,
|
64 |
+
tokenizer,
|
65 |
+
prompt=format_prompt(system_prompt, question),
|
66 |
+
verbose=True, # Set to True to see the prompt and response
|
67 |
+
temp=0.0,
|
68 |
+
max_tokens=32,
|
69 |
+
)
|
70 |
+
```
|