metadata
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
Meta-Llama-3.1-8B-Instruct Quantized Model
This repository contains the quantized version of the Meta-Llama-3.1-8B-Instruct model, optimized for efficient inference and deployment. The quantization was performed by the IPROPEL Team at VIT Chennai.
Model Overview
Meta-Llama-3.1-8B-Instruct is a powerful instruction-following model developed to generate human-like text, assist with various tasks, and answer questions. With 8 billion parameters, this model is capable of handling a wide range of tasks efficiently.
Quantization Details
Quantization is a model compression technique that reduces the size of the model without significantly sacrificing performance. The quantized version of the Meta-Llama-3.1-8B-Instruct model available here allows for:
- Reduced Memory Usage: Lower RAM and GPU memory consumption.
- Faster Inference: Speeds up inference time, enabling quicker responses in production environments.
- Smaller Model Size: Easier to store and deploy on devices with limited storage.
Key Features
- Model Name: Meta-Llama-3.1-8B-Instruct (Quantized)
- Tool Used: llama.cpp
- Maintained by: IPROPEL Team, VIT Chennai