SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 68
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning Paper • 2407.07523 • Published Jul 10 • 4
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 76
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19 • 37
DDK: Distilling Domain Knowledge for Efficient Large Language Models Paper • 2407.16154 • Published Jul 23 • 20
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Paper • 2408.00690 • Published Aug 1 • 21
MobileQuant: Mobile-friendly Quantization for On-device Language Models Paper • 2408.13933 • Published Aug 25 • 13
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published Sep 25 • 27
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 143