-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 111 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 21 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 15 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2312.09571
-
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 14 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 33
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 26 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 14
-
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 45 -
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Paper • 2312.02949 • Published • 11 -
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 19
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 2 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 16 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12 -
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Paper • 2312.17276 • Published • 15
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 1 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 2 -
Improving Length-Generalization in Transformers via Task Hinting
Paper • 2310.00726 • Published • 1 -
In-context Autoencoder for Context Compression in a Large Language Model
Paper • 2307.06945 • Published • 27
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 12 -
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 24