YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
llama2.c-stories15M-pruned50
This repo contains model files for llama2.c 15M tinystories optimized for NM-vLLM, a high-throughput serving engine for compressed LLMs.
This model was pruned with SparseGPT, using llm-compressor.
Sparsification
Install llm-compressor:
pip install llmcompressor
from llmcompressor.transformers import oneshot
from llmcompressor.transformers import SparseAutoModelForCausalLM
hf_model_stub = "Xenova/llama2.c-stories15M"
calibration_dataset = "open_platypus"
output_directory = f"{hf_model_stub.split('/')[-1]}-pruned_50.2of4-uncompressed"
model = SparseAutoModelForCausalLM.from_pretrained(hf_model_stub, torch_dtype="auto", device_map="auto")
recipe = """
test_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
sequential_update: true
mask_structure: "2:4"
targets: ['re:model.layers.\d*$']
"""
oneshot(
model=model,
dataset=calibration_dataset,
recipe=recipe,
output_dir=output_directory,
)
model.save_pretrained(output_directory, save_compressed=False)
Slack
For further support, and discussions on these models and AI in general, join Neural Magic's Slack Community
- Downloads last month
- 7