koyeb
/

Meta-Llama-3.1-8B-Instruct-Apple-MLX-Adapter

Question Answering

Inference Endpoints

Model card Files Files and versions Community

Meta-Llama-3.1-8B-Instruct-Apple-MLX-Adapter / README.md

rojasdiego's picture

Upload model

371777d verified about 2 months ago

|

2.2 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	datasets:
	- rojas-diego/Apple-MLX-QA
	language:
	- en
	library_name: transformers
	license: mit
	pipeline_tag: question-answering
	---

	# Meta-Llama-3.1-8B-Instruct-Apple-MLX

	## Overview

	This model is a fine-tuned version of Meta's LLaMa 3.1 8B model, specifically adapted to answer questions and provide guidance on Apple's latest machine learning framework, MLX. The fine-tuning was done using the LORA (Low-Rank Adaptation) method on a custom dataset of question-answer pairs derived from the MLX documentation.

	## Dataset

	Fine-tuned on a single epoch of [Apple MLX QA](https://huggingface.co/datasets/rojas-diego/Apple-MLX-QA).

	## Installation

	To use the model, you need to install the required dependencies:

	```bash
	pip install peft transformers jinja2==3.1.0
	```

	## Usage

	Here’s a sample code snippet to load and interact with the model:

	```python
	import torch
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load the base model and tokenizer
	model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16
	)
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

	# Load the fine-tuned model using LORA
	model = PeftModel.from_pretrained(
	model,
	"rojas-diego/Meta-Llama-3.1-8B-Instruct-Apple-MLX",
	).to("cuda")

	# Define input using a chat template with a system prompt and user query
	ids = tokenizer.apply_chat_template(
	[
	{
	"role": "system",
	"content": "You are a helpful AI coding assistant with expert knowledge of Apple's latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code.",
	},
	{
	"role": "user",
	"content": "How do you transpose a matrix in MLX?",
	},
	],
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt",
	).to("cuda")

	# Generate and print the response
	print(
	tokenizer.decode(
	model.generate(input_ids=ids, max_new_tokens=256, temperature=0.5).tolist()[0][
	len(ids) :
	]
	)
	)
	```