arxiv:2409.02060

OLMoE: Open Mixture-of-Experts Language Models

Published on Sep 3

· Submitted by

Muennighoff on Sep 4

#1 Paper of the day

Authors:

Niklas Muennighoff ,

Luca Soldaini ,

Dirk Groeneveld ,

,

Jacob Morrison ,

Sewon Min ,

Weijia Shi ,

,

Oyvind Tafjord ,

Nathan Lambert ,

Yuling Gu ,

Shane Arora ,

Akshita Bhagia ,

Dustin Schwenk ,

David Wadden ,

Alexander Wettig ,

Binyuan Hui ,

Tim Dettmers ,

Douwe Kiela ,

Ali Farhadi ,

,

Pang Wei Koh

Abstract

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

View arXiv page View PDF Add to collection

Community

Paper author Paper submitter 16 days ago

Code: https://github.com/allenai/OLMoE

15 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

7 days ago

Hey, Amazing work :)
We've summarised this and a few other papers in our blog. Hope you like it

KTO: The infamous alignment algorithm
OLMoE: Open Data, Weights, Code Mixture of Experts models
Mamba in the LlaMA: Distilling from Transformers to Mamba
PlanSearch: Improving Code Generation via Planning

https://datta0.substack.com/p/ai-unplugged-19-kto-for-model-alignment

5 days ago

•

edited 5 days ago

it is awesome

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 5

Collections including this paper 15