|
--- |
|
license: mit |
|
datasets: |
|
- ymoslem/Law-StackExchange |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
base_model: |
|
- google/gemma-2-2b |
|
library_name: mlx |
|
tags: |
|
- legal |
|
widget: |
|
- text: | |
|
<start_of_turn>user |
|
## Instructions |
|
You are a helpful AI assistant. |
|
## User |
|
How to make scrambled eggs?<end_of_turn> |
|
<start_of_turn>model |
|
--- |
|
# shellzero/gemma2-2b-ft-law-data-tag-generation |
|
This model was converted to MLX format from [`google/gemma-7b-it`](). |
|
Refer to the [original model card](https://huggingface.co/google/gemma-7b-it) for more details on the model. |
|
|
|
```zsh |
|
pip install mlx-lm |
|
``` |
|
|
|
The model was LoRA fine-tuned on the [ymoslem/Law-StackExchange](https://huggingface.co/datasets/ymoslem/Law-StackExchange) and Synthetic data generated from |
|
GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using `mlx`. |
|
|
|
This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon) |
|
|
|
```python |
|
def format_prompt(system_prompt: str, title: str, question: str) -> str: |
|
"Format the question to the format of the dataset we fine-tuned to." |
|
return """<bos><start_of_turn>user |
|
## Instructions |
|
{} |
|
## User |
|
TITLE: |
|
{} |
|
QUESTION: |
|
{}<end_of_turn> |
|
<start_of_turn>model |
|
""".format( |
|
system_prompt, title, question |
|
) |
|
``` |
|
|
|
Here's an example of the system_prompt from the dataset: |
|
```text |
|
Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas. |
|
``` |
|
## Loading the model using `mlx_lm` |
|
|
|
```python |
|
from mlx_lm import generate, load |
|
model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation") |
|
response = generate( |
|
model, |
|
tokenizer, |
|
prompt=format_prompt(system_prompt, question), |
|
verbose=True, # Set to True to see the prompt and response |
|
temp=0.0, |
|
max_tokens=32, |
|
) |
|
``` |