Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2305.13571

Papers - Text - Encoders - Bert

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - Training - Residual Connections

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - Attention - Multi-Head Attention (MHA)

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2
Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18
Are Sixteen Heads Really Better than One?

Paper • 1905.10650 • Published May 25, 2019 • 2
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Papers - Transformers Without Positional Encoding - NoPE

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information

Paper • 2203.16634 • Published Mar 30, 2022 • 5
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2
The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2

Papers - University - Princeton University

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 36
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Paper • 2404.05238 • Published Apr 8 • 3
Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 8
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - University - Carnegie Mellon University

Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22 • 32
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
PIQA: Reasoning about Physical Commonsense in Natural Language

Paper • 1911.11641 • Published Nov 26, 2019 • 2
AQuA: A Benchmarking Tool for Label Quality Assessment

Paper • 2306.09467 • Published Jun 15, 2023 • 1

The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2
Position Prediction as an Effective Pretraining Strategy

Paper • 2207.07611 • Published Jul 15, 2022 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information

Paper • 2203.16634 • Published Mar 30, 2022 • 5

Positional embeddings

Cure the headache of Transformers via Collinear Constrained Attention

Paper • 2309.08646 • Published Sep 15, 2023 • 12
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Paper • 2309.10400 • Published Sep 19, 2023 • 26
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit

Paper • 2205.13522 • Published May 26, 2022 • 1

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs