twhoool02
/

Llama2-7b-chat-HF-NF4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Edit model card

Model Card for Llama2-7b-chat-HF-NF4

Model Details

This model is a NF4 quantized version of the meta-llama/Llama-2-7b-chat-hf model.

Developed by: Ted Whooley
Library: Transformers, NF4
Model type: llama
Model name: Llama2-7b-chat-HF-NF4
Pipeline tag: text-generation
Qunatized by: twhoool02
Language(s) (NLP): en
License: other

Downloads last month: 5

Safetensors

Model size

3.6B params

Tensor type

F32

·

FP16

·

U8

·

Inference Examples

Text Generation

Inference API (serverless) is not available, repository is disabled.

Model tree for twhoool02/Llama2-7b-chat-HF-NF4

Base model

meta-llama/Llama-2-7b-chat-hf

Quantized

this model