license: apache-2.0 | |
datasets: | |
- gair-prox/FineWeb-pro | |
language: | |
- en | |
tags: | |
- llama | |
pipeline_tag: text-generation | |
library_name: transformers | |
# FW-ProX-1.7B | |
<p align="center"> | |
<img src="prox-teaser.png"> | |
</p> | |
[ArXiv](https://arxiv.org/abs/2409.17115) | [Models](https://huggingface.co/gair-prox/FW-ProX-1.7B) | [Data](https://huggingface.co/datasets/gair-prox/FineWeb-pro) | [Code](https://github.com/GAIR-NLP/program-every-example) | |
**FW-ProX-1.7B** is a small language model. It was and trained on the [FineWeb-pro](https://huggingface.co/datasets/gair-prox/FineWeb-pro) for 50B tokens. | |
## Evaluations | |
ProX models are evaluated over 10 language model benchmarks in zero-shot setting. | |
| | ArC-c | ARC-e | CSQA | HellaS | MMLU | OBQA | PiQA | SIQA | WinoG | SciQ | AVG | | |
|-----------------------|-------|-------|-------|-----------|-------|-------|-------|-------|-------|-------|------| | |
| raw | 28.5 | 52.6 | 33.9 | 53.2 | 29.8 | 32.6 | 72.9 | 40.2 | 53.0 | 77.1 | 47.4 | | |
| ours | 34.4 | 63.9 | 32.6 | 53.0 | 33.1 | 34.4 | 73.1 | 39.3 | 52.7 | 81.5 | 49.8 | | |
### Citation | |
``` | |
@article{zhou2024programming, | |
title={Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale}, | |
author={Zhou, Fan and Wang, Zengzhi and Liu, Qian and Li, Junlong and Liu, Pengfei}, | |
journal={arXiv preprint arXiv:2409.17115}, | |
year={2024} | |
} | |
``` |