metadata
license: llama3.1
language:
- en
base_model:
- meta-llama/Meta-Llama-3.1-8B-Instruct
Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit
- Model creator: Meta-Llama
- Original model: Meta-Llama-3.1-8B-Instruct
Overview
This repository contains a 4-bit AWQ version of Meta-Llama-3.1-8B-Instruct, optimized for the LMDeploy TurboMindEngine. The model is designed to provide efficient and accurate performance with reduced computational requirements.
Model Details
- Model Name: Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit
- Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
- Quantization: 4-bit AWQ
- Engine: LMDeploy TurboMindEngine
lmdeploy lite auto_awq \
$HF_MODEL \
--calib-dataset 'ptb' \
--calib-samples 128 \
--calib-seqlen 2048 \
--w-bits 4 \
--w-group-size 128 \
--batch-size 10 \
--search-scale True \
--work-dir $WORK_DIR