File size: 2,159 Bytes
a6b5c36 371777d fd2822e c245d0d fd2822e 371777d fd2822e a6b5c36 fd2822e 3f7d4a6 fd2822e 4241af1 fd2822e 4241af1 fd2822e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
datasets:
- koyeb/Apple-MLX-QA
language:
- en
library_name: transformers
license: mit
pipeline_tag: question-answering
---
# Meta-Llama-3.1-8B-Instruct-Apple-MLX
## Overview
This model is QLORA adapater for Meta's LLaMa 3.1 8B model, trained to answer questions and provide guidance on Apple's latest machine learning framework, MLX. The fine-tuning was done using the LORA (Low-Rank Adaptation) method on a custom dataset of question-answer pairs derived from the MLX documentation.
## Dataset
Fine-tuned on a single epoch of [Apple MLX QA](https://huggingface.co/datasets/koyeb/Apple-MLX-QA).
## Installation
To use the model, you need to install the required dependencies:
```bash
pip install peft transformers jinja2==3.1.0
```
## Usage
Here鈥檚 a sample code snippet to load and interact with the model:
```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# Load the fine-tuned model using LORA
model = PeftModel.from_pretrained(
model,
"koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX",
).to("cuda")
# Define input using a chat template with a system prompt and user query
ids = tokenizer.apply_chat_template(
[
{
"role": "system",
"content": "You are a helpful AI coding assistant with expert knowledge of Apple's latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code.",
},
{
"role": "user",
"content": "How do you transpose a matrix in MLX?",
},
],
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
# Generate and print the response
print(
tokenizer.decode(
model.generate(input_ids=ids, max_new_tokens=256, temperature=0.5).tolist()[0][
len(ids) :
]
)
)
``` |