-
Differential Transformer
Paper β’ 2410.05258 β’ Published β’ 165 -
Baichuan-Omni Technical Report
Paper β’ 2410.08565 β’ Published β’ 82 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper β’ 2410.17243 β’ Published β’ 86 -
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
Paper β’ 2410.16271 β’ Published β’ 80
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05258
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper β’ 2401.02994 β’ Published β’ 47 -
MambaByte: Token-free Selective State Space Model
Paper β’ 2401.13660 β’ Published β’ 50 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper β’ 2402.01032 β’ Published β’ 22 -
BlackMamba: Mixture of Experts for State-Space Models
Paper β’ 2402.01771 β’ Published β’ 23
-
Differential Transformer
Paper β’ 2410.05258 β’ Published β’ 165 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper β’ 2410.20672 β’ Published β’ 5 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper β’ 2410.23168 β’ Published β’ 17
-
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Paper β’ 2410.02707 β’ Published β’ 47 -
Differential Transformer
Paper β’ 2410.05258 β’ Published β’ 165 -
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Paper β’ 2410.05193 β’ Published β’ 12 -
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Paper β’ 2410.03864 β’ Published β’ 10
-
Differential Transformer
Paper β’ 2410.05258 β’ Published β’ 165 -
Stable Consistency Tuning: Understanding and Improving Consistency Models
Paper β’ 2410.18958 β’ Published β’ 9 -
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Paper β’ 2410.19313 β’ Published β’ 18