Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,26 @@
|
|
1 |
-
---
|
2 |
-
license: llama3.1
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3.1
|
3 |
+
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
4 |
+
---
|
5 |
+
|
6 |
+
# Meta-Llama-3.1-8B-Instruct Quantized Model
|
7 |
+
|
8 |
+
This repository contains the quantized version of the **Meta-Llama-3.1-8B-Instruct** model, optimized for efficient inference and deployment. The quantization was performed by the **IPROPEL Team** at **VIT Chennai**.
|
9 |
+
|
10 |
+
## Model Overview
|
11 |
+
|
12 |
+
**Meta-Llama-3.1-8B-Instruct** is a powerful instruction-following model developed to generate human-like text, assist with various tasks, and answer questions. With 8 billion parameters, this model is capable of handling a wide range of tasks efficiently.
|
13 |
+
|
14 |
+
### Quantization Details
|
15 |
+
|
16 |
+
Quantization is a model compression technique that reduces the size of the model without significantly sacrificing performance. The quantized version of the Meta-Llama-3.1-8B-Instruct model available here allows for:
|
17 |
+
|
18 |
+
- **Reduced Memory Usage**: Lower RAM and GPU memory consumption.
|
19 |
+
- **Faster Inference**: Speeds up inference time, enabling quicker responses in production environments.
|
20 |
+
- **Smaller Model Size**: Easier to store and deploy on devices with limited storage.
|
21 |
+
|
22 |
+
### Key Features
|
23 |
+
|
24 |
+
- **Model Name**: Meta-Llama-3.1-8B-Instruct (Quantized)
|
25 |
+
- **Tool Used**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
26 |
+
- **Maintained by**: IPROPEL Team, VIT Chennai
|