--- license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct --- # Meta-Llama-3.1-8B-Instruct Quantized Model This repository contains the quantized version of the **Meta-Llama-3.1-8B-Instruct** model, optimized for efficient inference and deployment. The quantization was performed by the **IPROPEL Team** at **VIT Chennai**. ## Model Overview **Meta-Llama-3.1-8B-Instruct** is a powerful instruction-following model developed to generate human-like text, assist with various tasks, and answer questions. With 8 billion parameters, this model is capable of handling a wide range of tasks efficiently. ### Quantization Details Quantization is a model compression technique that reduces the size of the model without significantly sacrificing performance. The quantized version of the Meta-Llama-3.1-8B-Instruct model available here allows for: - **Reduced Memory Usage**: Lower RAM and GPU memory consumption. - **Faster Inference**: Speeds up inference time, enabling quicker responses in production environments. - **Smaller Model Size**: Easier to store and deploy on devices with limited storage. ### Key Features - **Model Name**: Meta-Llama-3.1-8B-Instruct (Quantized) - **Tool Used**: [llama.cpp](https://github.com/ggerganov/llama.cpp) - **Maintained by**: IPROPEL Team, VIT Chennai