---
license: apache-2.0
datasets:
- gair-prox/open-web-math-pro
language:
- en
base_model:
- mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation
library_name: transformers
---

# Mistral-7B-ProXMath

<p align="center">
  <img src="prox-teaser.png">
</p>

[ArXiv](http://arxiv.org/abs/2409.17115) | [Data: OpenWebMath-Pro](https://huggingface.co/datasets/gair-prox/open-web-math-pro) | [Code](https://github.com/GAIR-NLP/program-every-example)

**Mistral-7B-ProXMath** is a math-adapted Mistral-7B-v0.1 model that is continually pre-trained on [OpenWebMath-Pro](https://huggingface.co/datasets/gair-prox/open-web-math-pro) (a refined version by ProX) for **10**B tokens.

## Evaluations

ProX models are evaluated on 9 common math reasoning benchmarks.


| Model               |   asdiv  |   gsm8k  |  mathqa  |   mawps  | minerva_math | mmlu_stem | sat_math |   svamp  |  tabmwp  |  average |
|:---------------------:|:--------:|:--------:|:--------:|:--------:|:------------:|:---------:|:--------:|:--------:|:--------:|:--------:|
| Mistral-7B-v0.1     |   68.5   |   40.6   |   32.3   |   87.0   |     11.4     |    50.0   |   56.2   | **65.4** | **52.9** |   51.6   |
| Mistral-7B-ProXMath | **72.9** | **51.0** | **53.0** | **89.2** |   **22.4**   |  **54.2** | **75.0** |   64.9   |   49.8   | **59.2** |


### Citation
```
@article{zhou2024programming,
  title={Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale},
  author={Zhou, Fan and Wang, Zengzhi and Liu, Qian and Li, Junlong and Liu, Pengfei},
  journal={arXiv preprint arXiv:2409.17115},
  year={2024}
}
```