--- license: mit datasets: - ymoslem/Law-StackExchange language: - en metrics: - f1 base_model: - google/gemma-2-2b library_name: mlx tags: - legal widget: - text: | user ## Instructions You are a helpful AI assistant. ## User How to make scrambled eggs? model --- # shellzero/gemma2-2b-ft-law-data-tag-generation This model was converted to MLX format from [`google/gemma-7b-it`](). Refer to the [original model card](https://huggingface.co/google/gemma-7b-it) for more details on the model. ```zsh pip install mlx-lm ``` The model was LoRA fine-tuned on the [ymoslem/Law-StackExchange](https://huggingface.co/datasets/ymoslem/Law-StackExchange) and Synthetic data generated from GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using `mlx`. This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon) ```python def format_prompt(system_prompt: str, title: str, question: str) -> str: "Format the question to the format of the dataset we fine-tuned to." return """user ## Instructions {} ## User TITLE: {} QUESTION: {} model """.format( system_prompt, title, question ) ``` Here's an example of the system_prompt from the dataset: ```text Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas. ``` ## Loading the model using `mlx_lm` ```python from mlx_lm import generate, load model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation") response = generate( model, tokenizer, prompt=format_prompt(system_prompt, question), verbose=True, # Set to True to see the prompt and response temp=0.0, max_tokens=32, ) ```