-
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 25 -
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 165 -
UniMuMo: Unified Text, Music and Motion Generation
Paper • 2410.04534 • Published • 18
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05258
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 37 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 5 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 21 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 11
-
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
Paper • 2409.08513 • Published • 10 -
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper • 2409.08264 • Published • 43 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 73 -
LLMs + Persona-Plug = Personalized LLMs
Paper • 2409.11901 • Published • 30
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 83 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 80 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 61 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3
-
Associative Recurrent Memory Transformer
Paper • 2407.04841 • Published • 31 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 55 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 63 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 251
-
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
Paper • 2406.10996 • Published • 32 -
Simulating Classroom Education with LLM-Empowered Agents
Paper • 2406.19226 • Published • 29 -
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Paper • 2406.19389 • Published • 51 -
LAMBDA: A Large Model Based Data Agent
Paper • 2407.17535 • Published • 34
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 19 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 8 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 27 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 118
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 143 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 73 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 27