Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.05258

Papers - Training - Differential Transformer

Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

New Transformers or alternatives

Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

Favorite Papers

Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Paper • 2410.20424 • Published 10 days ago • 36

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3 • 22
Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published about 1 month ago • 6
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published 20 days ago • 24

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors

Paper • 2206.02874 • Published Jun 6, 2022
Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

Paper • 2409.13373 • Published Sep 20 • 2
Differential Transformer

Paper • 2410.05258 • Published 30 days ago • 165

DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Paper • 2410.00201 • Published Sep 30
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems

Paper • 2409.19804 • Published Sep 29
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling

Paper • 2409.15156 • Published Sep 23
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Paper • 2409.04927 • Published Sep 7

Previous
1
2
3
4
5
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs