Phi3-SigLiP-MoE / README.md
DavidNguyen's picture
Update README.md
bfce18d verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - microsoft/Phi-3-mini-4k-instruct
pipeline_tag: image-text-to-text

LibMoE: A Library for Comprehensive Benchmarking of Mixture of Experts in Large Language Models

Introduction

Mixture of Experts (MoEs) plays an essential role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large-scale MoE algorithms remains inaccessible to many researchers. This work introduces LibMoE, a comprehensive and modular framework designed to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training, and (iii) comprehensive evaluation, LibMoE makes MoEs in LLMs more accessible to a wider range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms across three different LLMs and 11 datasets under a zero-shot setting. The results show that, despite unique characteristics, all MoE algorithms perform similarly when averaged across a broad range of tasks. With its modular design and extensive evaluation capabilities, we believe LibMoE will be invaluable for researchers striving to make meaningful progress toward the next generation of MoE and LLMs.

Model and Evaluation Benchmarks

We have released five MoE algorithms trained based on microsoft/Phi-3-mini-4k-instruct for LLMs and SigLIP for vision encoding. These models were trained on the LLAVA-665K dataset. We evaluated these state-of-the-art algorithms on 11 benchmarks, examining various aspects of MoE algorithm performance.

Model MoE Method AI2D Text VQA GQA Hallusion
Benchmark
MathVista
Validation
MMBenchEN
/ dev
MMMU
Validation
MMStar POPE SQA IMG
Full
MME AVG
SigLIP 224 + Phi3 SMoE-R 64.35 40.35 60.03 41.75 28.7 67.96 40.22 39.47 84.31 80.71 1,655.81 54.78
Cosine-R 64.6 41.98 60.74 41.43 31.3 70.61 41.22 38.5 86.33 81.49 1,759.21 55.82
Sigmoid-R 64.66 41.05 60.52 40.8 28.8 69.07 40.89 39.29 86.54 80.85 1,766.03 55.25
Hyper-R 65.12 41.67 59.88 41.32 30.3 69.33 41.44 39.86 85.4 79.03 1,752.39 55.34
Perturbed Cosine-R 64.8 41.89 61.0 40.9 31.8 70.7 42.0 39.6 86.43 81.44 1,776.54 56.06

Run LibMoE

We provide detailed instructions for setting up and running experiments in this repository: https://github.com/Fsoft-AIC/LibMoE

Hardware Resources

Stage MoE Method Hardware
Pre-Training 4xA100
Pre-FineTuning 4xA100
VIT SMoE-R 6xA100
Cosine-R 6xA100
Sigmoid-R 6xA100
Hyper-R 6xA100
Perturbed Cosine-R 6xA100

Citation Information

More details can be found in our paper.

If you use LibMoE, please cite it using this BibTeX:

@misc{nguyen2024libmoelibrarycomprehensivebenchmarking,
      title={LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models}, 
      author={Nam V. Nguyen and Thong T. Doan and Luong Tran and Van Nguyen and Quang Pham},
      year={2024},
      eprint={2411.00918},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.00918}, 
}