Model Card for Fine-Tune Lama 3.1
Model Name: Fine-Tuned Lama 3.1
Model Description: Fine-Tuned Lama 3.1 is a customized version of Meta’s Llama-3.1 (8B parameters) model, fine-tuned on task-specific datasets using LoRA (Low-Rank Adaptation) and quantized to 4-bit precision for efficient inference. This model has been fine-tuned for improved performance on causal language modeling tasks, with optimized generation parameters for concise, context-aware responses.
Model Details:
• Model Type: Causal Language Model (LLM)
• Base Model: Meta-Llama-3.1-8B
• Architecture: Transformer-based autoregressive model
• Quantization: 4-bit precision using BitsAndBytes for memory efficiency
• Training Method: LoRA fine-tuning
• Task: General language generation, conversation, text completion
Use Cases:
• Conversational AI assistants
• Text completion
• Response generation in chatbots
• Any task that involves understanding and generating human-like text
Fine-Tuning Process:
• LoRA Configuration:
• r=8, lora_alpha=16, lora_dropout=0.05
• This setup introduces efficient low-rank adaptation to improve model training with a smaller number of parameters.
• Training Arguments:
• Batch size per device: 4
• Learning rate: 2e-4
• Training epochs: 3
• Gradient accumulation: 16 steps
• Optimizer: paged_adamw_32bit
• Fine-tuning on custom dataset using Trainer with push to Hugging Face Hub.
• Quantization:
• Load in 4-bit precision (bnb_4bit), quantization type: nf4
• The model is optimized for efficient inference using float16 compute precision.
Dataset:
The fine-tuning dataset contains curated conversations and responses that focus on natural language tasks such as summarization, paraphrasing, and conversation, structured as pairs of prompts and responses.
Sample data snippet:
Prompt: "Time segment 0 to 4 seconds: The sun rises over a quiet beach." Response: ["sunrise beach", "quiet shoreline", "rising sun"]
Inference and Generation:
• Generation Config:
• penalty_alpha=0.6
• do_sample=True
• top_k=5
• temperature=0.5
• repetition_penalty=1.2
• max_new_tokens=60
This configuration ensures coherent and creative generation within a controlled range.
Performance:
• Hardware Requirements:
• 8-bit/4-bit quantization allows the model to run on consumer-grade GPUs with efficient memory utilization.
• Inference Time:
• Response generation time varies based on prompt complexity but typically completes within 2-4 seconds on a standard GPU setup.
Limitations and Ethical Considerations:
• The model might generate biased or inappropriate content as it is trained on publicly available datasets and could reflect biases inherent in those datasets.
• Proper filtering and human supervision are recommended for sensitive use cases, such as those involving ethical or safety-critical scenarios.
Future Work:
The model can be further fine-tuned with domain-specific datasets or adapted for tasks requiring more nuanced understanding or specialized knowledge.
Model tree for mehdibukhari/llama3.18B-Fine-tunedByMehdi
Base model
meta-llama/Llama-3.1-8B