Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.05258

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30 • 32
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25 • 11
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 15
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 26

TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

Paper • 2312.05605 • Published Dec 9, 2023 • 1
VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18 • 37
Rethinking Patch Dependence for Masked Autoencoders

Paper • 2401.14391 • Published Jan 25 • 22
Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25 • 16

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 82
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 19
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Paper • 2309.15129 • Published Sep 25, 2023 • 6
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

Previous
1
...
3
4
5
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs