|
--- |
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
datasets: |
|
- rojas-diego/Apple-MLX-QA |
|
language: |
|
- en |
|
library_name: transformers |
|
license: mit |
|
pipeline_tag: question-answering |
|
--- |
|
|
|
# Meta-Llama-3.1-8B-Instruct-Apple-MLX |
|
|
|
## Overview |
|
|
|
This model is a fine-tuned version of Meta's LLaMa 3.1 8B model, specifically adapted to answer questions and provide guidance on Apple's latest machine learning framework, MLX. The fine-tuning was done using the LORA (Low-Rank Adaptation) method on a custom dataset of question-answer pairs derived from the MLX documentation. |
|
|
|
## Dataset |
|
|
|
Fine-tuned on a single epoch of [Apple MLX QA](https://huggingface.co/datasets/rojas-diego/Apple-MLX-QA). |
|
|
|
## Installation |
|
|
|
To use the model, you need to install the required dependencies: |
|
|
|
```bash |
|
pip install peft transformers jinja2==3.1.0 |
|
``` |
|
|
|
## Usage |
|
|
|
Here鈥檚 a sample code snippet to load and interact with the model: |
|
|
|
```python |
|
import torch |
|
from peft import PeftModel |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load the base model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16 |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") |
|
|
|
# Load the fine-tuned model using LORA |
|
model = PeftModel.from_pretrained( |
|
model, |
|
"rojas-diego/Meta-Llama-3.1-8B-Instruct-Apple-MLX", |
|
).to("cuda") |
|
|
|
# Define input using a chat template with a system prompt and user query |
|
ids = tokenizer.apply_chat_template( |
|
[ |
|
{ |
|
"role": "system", |
|
"content": "You are a helpful AI coding assistant with expert knowledge of Apple's latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code.", |
|
}, |
|
{ |
|
"role": "user", |
|
"content": "How do you transpose a matrix in MLX?", |
|
}, |
|
], |
|
tokenize=True, |
|
add_generation_prompt=True, |
|
return_tensors="pt", |
|
).to("cuda") |
|
|
|
# Generate and print the response |
|
print( |
|
tokenizer.decode( |
|
model.generate(input_ids=ids, max_new_tokens=256, temperature=0.5).tolist()[0][ |
|
len(ids) : |
|
] |
|
) |
|
) |
|
``` |