Collections
Discover the best community collections!
Collections including paper arxiv:2305.13571
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Transformers Can Represent n-gram Language Models
Paper • 2404.14994 • Published • 18 -
Are Sixteen Heads Really Better than One?
Paper • 1905.10650 • Published • 2 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1
-
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 36 -
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy
Paper • 2404.05238 • Published • 3 -
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Position Prediction as an Effective Pretraining Strategy
Paper • 2207.07611 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
-
Cure the headache of Transformers via Collinear Constrained Attention
Paper • 2309.08646 • Published • 12 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Paper • 2205.13522 • Published • 1