README.md · Fsoft-AIC/Phi3-SigLiP-MoE at main

metadata

license: apache-2.0
language:
  - en
base_model:
  - microsoft/Phi-3-mini-4k-instruct
pipeline_tag: image-text-to-text

LibMoE: A Library for Comprehensive Benchmarking of Mixture of Experts in Large Language Models

Introduction

Mixture of Experts (MoEs) plays an essential role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large-scale MoE algorithms remains inaccessible to many researchers. This work introduces LibMoE, a comprehensive and modular framework designed to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training, and (iii) comprehensive evaluation, LibMoE makes MoEs in LLMs more accessible to a wider range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms across three different LLMs and 11 datasets under a zero-shot setting. The results show that, despite unique characteristics, all MoE algorithms perform similarly when averaged across a broad range of tasks. With its modular design and extensive evaluation capabilities, we believe LibMoE will be invaluable for researchers striving to make meaningful progress toward the next generation of MoE and LLMs.

Model and Evaluation Benchmarks

We have released five MoE algorithms trained based on microsoft/Phi-3-mini-4k-instruct for LLMs and SigLIP for vision encoding. These models were trained on the LLAVA-665K dataset. We evaluated these state-of-the-art algorithms on 11 benchmarks, examining various aspects of MoE algorithm performance.

Model	MoE Method	AI2D	Text VQA	GQA	Hallusion Benchmark	MathVista Validation	MMBenchEN / dev	MMMU Validation	MMStar	POPE	SQA IMG Full	MME	AVG
SigLIP 224 + Phi3	SMoE-R	64.35	40.35	60.03	41.75	28.7	67.96	40.22	39.47	84.31	80.71	1,655.81	54.78
	Cosine-R	64.6	41.98	60.74	41.43	31.3	70.61	41.22	38.5	86.33	81.49	1,759.21	55.82
	Sigmoid-R	64.66	41.05	60.52	40.8	28.8	69.07	40.89	39.29	86.54	80.85	1,766.03	55.25
	Hyper-R	65.12	41.67	59.88	41.32	30.3	69.33	41.44	39.86	85.4	79.03	1,752.39	55.34
	Perturbed Cosine-R	64.8	41.89	61.0	40.9	31.8	70.7	42.0	39.6	86.43	81.44	1,776.54	56.06

Run LibMoE

We provide detailed instructions for setting up and running experiments in this repository: https://github.com/Fsoft-AIC/LibMoE

Hardware Resources

Stage	MoE Method	Hardware
Pre-Training		4xA100
Pre-FineTuning		4xA100
VIT	SMoE-R	6xA100
	Cosine-R	6xA100
	Sigmoid-R	6xA100
	Hyper-R	6xA100
	Perturbed Cosine-R	6xA100

Citation Information

More details can be found in our paper.

If you use LibMoE, please cite it using this BibTeX:

@misc{nguyen2024libmoelibrarycomprehensivebenchmarking,
      title={LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models}, 
      author={Nam V. Nguyen and Thong T. Doan and Luong Tran and Van Nguyen and Quang Pham},
      year={2024},
      eprint={2411.00918},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.00918}, 
}