File size: 1,924 Bytes
cd5e946 1c34a32 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: mit
datasets:
- ymoslem/Law-StackExchange
language:
- en
metrics:
- f1
base_model:
- google/gemma-2-2b
library_name: mlx
tags:
- legal
widget:
- text: |
<start_of_turn>user
## Instructions
You are a helpful AI assistant.
## User
How to make scrambled eggs?<end_of_turn>
<start_of_turn>model
---
# shellzero/gemma2-2b-ft-law-data-tag-generation
This model was converted to MLX format from [`google/gemma-7b-it`]().
Refer to the [original model card](https://huggingface.co/google/gemma-7b-it) for more details on the model.
```zsh
pip install mlx-lm
```
The model was LoRA fine-tuned on the [ymoslem/Law-StackExchange](https://huggingface.co/datasets/ymoslem/Law-StackExchange) and Synthetic data generated from
GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using `mlx`.
This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon)
```python
def format_prompt(system_prompt: str, title: str, question: str) -> str:
"Format the question to the format of the dataset we fine-tuned to."
return """<bos><start_of_turn>user
## Instructions
{}
## User
TITLE:
{}
QUESTION:
{}<end_of_turn>
<start_of_turn>model
""".format(
system_prompt, title, question
)
```
Here's an example of the system_prompt from the dataset:
```text
Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas.
```
## Loading the model using `mlx_lm`
```python
from mlx_lm import generate, load
model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation")
response = generate(
model,
tokenizer,
prompt=format_prompt(system_prompt, question),
verbose=True, # Set to True to see the prompt and response
temp=0.0,
max_tokens=32,
)
``` |