--- base_model: meta-llama/Meta-Llama-3.1-8B-Instruct datasets: - koyeb/Apple-MLX-QA language: - en library_name: transformers license: mit pipeline_tag: question-answering --- # Meta-Llama-3.1-8B-Instruct-Apple-MLX ## Overview This model is QLORA adapater for Meta's LLaMa 3.1 8B model, trained to answer questions and provide guidance on Apple's latest machine learning framework, MLX. The fine-tuning was done using the LORA (Low-Rank Adaptation) method on a custom dataset of question-answer pairs derived from the MLX documentation. ## Dataset Fine-tuned on a single epoch of [Apple MLX QA](https://huggingface.co/datasets/koyeb/Apple-MLX-QA). ## Installation To use the model, you need to install the required dependencies: ```bash pip install peft transformers jinja2==3.1.0 ``` ## Usage Here’s a sample code snippet to load and interact with the model: ```python import torch from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer # Load the base model and tokenizer model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16 ) tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") # Load the fine-tuned model using LORA model = PeftModel.from_pretrained( model, "koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX", ).to("cuda") # Define input using a chat template with a system prompt and user query ids = tokenizer.apply_chat_template( [ { "role": "system", "content": "You are a helpful AI coding assistant with expert knowledge of Apple's latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code.", }, { "role": "user", "content": "How do you transpose a matrix in MLX?", }, ], tokenize=True, add_generation_prompt=True, return_tensors="pt", ).to("cuda") # Generate and print the response print( tokenizer.decode( model.generate(input_ids=ids, max_new_tokens=256, temperature=0.5).tolist()[0][ len(ids) : ] ) ) ```