TinyStories: How Small Can Language Models Be and Still Speak Coherent English? Paper • 2305.07759 • Published May 12, 2023 • 33
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 44
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 13
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper • 2310.17752 • Published Oct 26, 2023 • 12
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper • 2401.04658 • Published Jan 9 • 24
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 53