shellzero
/

gemma2-2b-ft-law-data-tag-generation

Model card Files Files and versions Community

shellzero commited on 6 days ago

Commit

1c34a32

•

1 Parent(s): ac5691e

Update README.md

Added more details.

Files changed (1) hide show

README.md +57 -1

README.md CHANGED Viewed

@@ -11,4 +11,60 @@ base_model:
 library_name: mlx
 tags:
 - legal
----

 library_name: mlx
 tags:
 - legal
+widget:
+  - text: |
+      <start_of_turn>user
+      ## Instructions
+      You are a helpful AI assistant.
+      ## User
+      How to make scrambled eggs?<end_of_turn>
+      <start_of_turn>model
+---
+# shellzero/gemma2-2b-ft-law-data-tag-generation
+This model was converted to MLX format from [`google/gemma-7b-it`]().
+Refer to the [original model card](https://huggingface.co/google/gemma-7b-it) for more details on the model.
+```zsh
+pip install mlx-lm
+```
+The model was LoRA fine-tuned on the [ymoslem/Law-StackExchange](https://huggingface.co/datasets/ymoslem/Law-StackExchange) and Synthetic data generated from
+GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using `mlx`.
+This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon)
+```python
+def format_prompt(system_prompt: str, title: str, question: str) -> str:
+    "Format the question to the format of the dataset we fine-tuned to."
+    return """<bos><start_of_turn>user
+## Instructions
+{}
+## User
+TITLE:
+{}
+QUESTION:
+{}<end_of_turn>
+<start_of_turn>model
+""".format(
+        system_prompt, title, question
+    )
+```
+Here's an example of the system_prompt from the dataset:
+```text
+Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas.
+```
+## Loading the model using `mlx_lm`
+```python
+from mlx_lm import generate, load
+model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation")
+response = generate(
+    model,
+    tokenizer,
+    prompt=format_prompt(system_prompt, question),
+    verbose=True,  # Set to True to see the prompt and response
+    temp=0.0,
+    max_tokens=32,
+)
+```