Papers
arxiv:2403.19887

Jamba: A Hybrid Transformer-Mamba Language Model

Published on Mar 28
Β· Submitted by akhaliq on Apr 1
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.

Community

good

TechxGenus (Hao Jiang) made a 69M param model https://huggingface.co/TechxGenus/Mini-Jamba

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

I have featured this paper on my blog. You can read it at https://rb.gy/6r0rs3

Jamba: Revolutionizing Language Models with a Hybrid Transformer Approach

Links πŸ”—:

πŸ‘‰ Subscribe: https://www.youtube.com/@Arxflix
πŸ‘‰ Twitter: https://x.com/arxflix
πŸ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.19887 in a dataset README.md to link it from this page.

Spaces citing this paper 7

Collections including this paper 44