--- license: apache-2.0 language: - ar - en tags: - llama3.1 - arabic - 'pretrained ' - 'lora ' library_name: peft pipeline_tag: text-generation --- # 🚀 Arabic LLaMa 3.1 Lora Model (Version 1) Arabic LLaMa 3.1 Lora Model is an Fine Tuned Model based on the new released LLAMA3.1 Model on Arabic BigScience xP3 Dataset [Arabic BigScience xP3](https://huggingface.co/datasets/M-A-D/Mixed-Arabic-Datasets-Repo/viewer/Ara--bigscience--xP3). ## Model Summary - **Model Type:** Llama3.1 Lora Model - **Language(s):** Arabic - **Base Model:** [unsloth/Meta-Llama-3.1-8B](https://huggingface.co/unsloth/Meta-Llama-3.1-8B) ## Model Details - The model was fine-tuned in 4-bit precision using [unsloth](https://github.com/unslothai/unsloth) for 16k Step on 1 GPU ## I prepared for you a Gradio App to do inference with the model and compare its results with the base llama3.1 model ## Note just run the following code in colab: ### Gradio APP (Colab T4 GPU is enough to run the app) ```python !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes !pip install gradio import gradio as gr from unsloth import FastLanguageModel import torch # Load base model and tokenizer base_model, base_tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Meta-Llama-3.1-8B", max_seq_length=2048, dtype=None, load_in_4bit=True, ) FastLanguageModel.for_inference(base_model) # Enable native 2x faster inference # Load LoRA model and tokenizer lora_model, lora_tokenizer = FastLanguageModel.from_pretrained( model_name="Omartificial-Intelligence-Space/Arabic-llama3.1-lora-FT", # Replace with your LoRA model path/name max_seq_length=2048, dtype=None, load_in_4bit=True, ) FastLanguageModel.for_inference(lora_model) # Enable native 2x faster inference simplified_prompt = """Input: {} Response: {}""" def extract_response(text): """ Extracts the Response part from the generated text """ response_marker = "Response:" if response_marker in text: return text.split(response_marker, 1)[1].strip() return text.strip() def generate_responses(input_text): prompt = simplified_prompt.format(input_text, "") # Tokenize input for base model base_inputs = base_tokenizer([prompt], return_tensors="pt").to("cuda") # Generate output using base model base_outputs = base_model.generate(**base_inputs, max_new_tokens=128, use_cache=True) # Decode base model output base_decoded_outputs = base_tokenizer.batch_decode(base_outputs, skip_special_tokens=True)[0] base_response = extract_response(base_decoded_outputs) # Tokenize input for LoRA model lora_inputs = lora_tokenizer([prompt], return_tensors="pt").to("cuda") # Generate output using LoRA model lora_outputs = lora_model.generate(**lora_inputs, max_new_tokens=128, use_cache=True) # Decode LoRA model output lora_decoded_outputs = lora_tokenizer.batch_decode(lora_outputs, skip_special_tokens=True)[0] lora_response = extract_response(lora_decoded_outputs) return base_response, lora_response # Custom CSS for the interface css = """ h1 { color: #1E90FF; font-family: 'Arial', sans-serif; text-align: center; margin-bottom: 20px; } .description { color: #4682B4; font-family: 'Arial', sans-serif; text-align: center; font-size: 18px; margin-bottom: 20px; } .gradio-container { background-color: #F0F0F0; border-radius: 10px; padding: 20px; } .gr-button { background-color: #FFA500; color: white; border: none; padding: 10px 20px; text-align: center; display: inline-block; font-size: 16px; margin: 4px 2px; cursor: pointer; } .gr-button:hover { background-color: #FF8C00; } .gr-textbox { border: 2px solid #1E90FF; border-radius: 5px; padding: 10px; } """ # JavaScript for additional functionality (if needed) js = """ function createGradioAnimation() { var container = document.createElement('div'); container.id = 'gradio-animation'; container.style.fontSize = '2em'; container.style.fontWeight = 'bold'; container.style.textAlign = 'center'; container.style.marginBottom = '20px'; var text = 'Arabic-LLama3.1-Lora Model'; for (var i = 0; i < text.length; i++) { (function(i){ setTimeout(function(){ var letter = document.createElement('span'); letter.style.opacity = '0'; letter.style.transition = 'opacity 0.5s'; letter.innerText = text[i]; container.appendChild(letter); setTimeout(function() { letter.style.opacity = '1'; }, 50); }, i * 250); })(i); } var gradioContainer = document.querySelector('.gradio-container'); gradioContainer.insertBefore(container, gradioContainer.firstChild); return 'Animation created'; } """ with gr.Blocks(css=css, js=js) as demo: gr.Markdown("

Arabic llaMa3.1 Lora Model (Version 1)

") gr.Markdown("

This model is the Arabic version of Llama3.1, utilized to answer in Arabic for different types of prompts.

") with gr.Row(): input_text = gr.Textbox(lines=5, placeholder="Enter input text here...", elem_classes="gr-textbox") base_output = gr.Textbox(label="Base Model Output", elem_classes="gr-textbox") lora_output = gr.Textbox(label="LoRA Model Output", elem_classes="gr-textbox") generate_button = gr.Button("Generate Responses", elem_classes="gr-button") generate_button.click(generate_responses, inputs=input_text, outputs=[base_output, lora_output]) demo.launch(debug = True) ``` ### Recommendations - [unsloth](https://github.com/unslothai/unsloth) for finetuning models. You can get a 2x faster finetuned model which can be exported to any format or uploaded to Hugging Face. ## Acknowledgments The author would like to thank Prince Sultan University for their invaluable support in this project. Their contributions and resources have been instrumental in the development and fine-tuning of these models. ```markdown ## Citation If you use the Arabic llama3.1 Lora Model, please cite it as follows: ```bibtex @model{nacar2024, author = {Omer Nacar}, title = {Arabic llama3.1 Lora Model}, year = 2024, url = {https://huggingface.co/Omartificial-Intelligence-Space/Arabic-llama3.1-Chat-lora}, version = {1.0.0}, }