-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 121 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 86 -
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Paper • 2409.02634 • Published • 85 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 81
Collections
Discover the best community collections!
Collections including paper arxiv:2409.18869
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 38 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 47 -
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Paper • 2407.18219 • Published • 3
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 125 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
-
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Paper • 2409.17481 • Published • 44 -
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
Paper • 2409.14683 • Published • 8 -
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Paper • 2409.17422 • Published • 23 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 75
-
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
Paper • 2409.08513 • Published • 10 -
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper • 2409.08264 • Published • 42 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 69 -
LLMs + Persona-Plug = Personalized LLMs
Paper • 2409.11901 • Published • 30