-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Paper • 2307.05695 • Published • 22 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 84 -
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 34
Collections
Discover the best community collections!
Collections including paper arxiv:2307.05695
-
Efficient Few-Shot Learning Without Prompts
Paper • 2209.11055 • Published • 3 -
Parameter-Efficient Transfer Learning for NLP
Paper • 1902.00751 • Published • 1 -
GPT Understands, Too
Paper • 2103.10385 • Published • 8 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 9
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 22 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 12 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 30 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 11