File size: 3,073 Bytes
cd9bf63 d114d12 cd9bf63 7ee38f3 42fa051 cd9bf63 7ee38f3 f7cec35 1cff19b 0038b20 7ee38f3 0038b20 7ee38f3 5794441 7ee38f3 837b0fd 0038b20 7ee38f3 5794441 7ee38f3 5794441 0038b20 7ee38f3 5794441 7ee38f3 5794441 0038b20 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 837b0fd 0038b20 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
language:
- zh
- en
license: apache-2.0
datasets:
- wenbopan/Fusang-v1
- wenbopan/OpenOrca-zh-20k
---
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/62cd3a3691d27e60db0698b0/s21sMRxRT56c5t4M15GBP.webp)
# Faro-Yi-9B
Faro-Yi-9B is an improved [Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K) with extensive instruction tuning on [Fusang-V1](https://huggingface.co/datasets/wenbopan/Fusang-v1). Compared to Yi-9B-200K, Faro-Yi-9B has gained greater capability in various downstream tasks and long-context modeling thanks to the large-scale synthetic data in Fusang-V1.
## Performance
Faro-Yi-9B enhances its ability compared to Yi-9B-200K in most dimensions, especially in long-range modeling and bilingual (English, Chinese) understanding. Fi is competitive among all open-sourced models at around 9B parameters. Fi-9B is good at both factual tasks and preferred by LLM-judges.
### Fact-based Evaluation (Open LLM Leaderboard)
| **Metric** | **MMLU** | GSM8K | **HellaSwag** | **TruthfulQA** | **Arc** | **Winogrande** |
| -------------- | --------- | --------- | ------------- | -------------- | ----------- | -------------- |
| **Yi-9B-200K** | 65.73 | 50.49 | 56.72 | 33.80 | 69.25 | 71.67 |
| **Faro-Yi-9B** | **68.80** | **63.08** | **57.28** | **40.86** | **72.58** | 71.11 |
### Long-context Modeling (LongBench)
| **Name** | **Average_zh** | **Average_en** | **Code Completion** |
|----------------|----------------|----------------|---------------------|
| **Yi-9B-200K** | 30.288 | 36.7071 | 72.2 |
| **Faro-Yi-9B** | **41.092** | **40.9536** | 46.0 |
<details>
<summary>Score breakdown</summary>
| **Name** | **Few-shot Learning_en** | **Synthetic Tasks_en** | **Single-Doc QA_en** | **Multi-Doc QA_en** | **Summarization_en** | **Few-shot Learning_zh** | **Synthetic Tasks_zh** | **Single-Doc QA_zh** | **Multi-Doc QA_zh** | **Summarization_zh** |
|----------------|--------------------------|------------------------|----------------------|---------------------|----------------------|--------------------------|------------------------|----------------------|---------------------|----------------------|
| **Yi-9B-200K** | 60.6 | 22.8 | 30.9 | 38.9 | 25.8 | 46.5 | 28.0 | 49.6 | 17.7 | 9.7 |
| **Faro-Yi-9B** | **63.8** | **40.2** | **36.2** | 38.0 | **26.3** | 30.0 | **75.1** | **55.6** | **30.7** | **14.1** |
</details>
<!--### Performance on Preference TODO-->
### Bilingual Ability (CMMLU & MMLU)
| **Name** | MMLU | **CMMLU** |
| -------------- | --------- | --------- |
| **Yi-9B-200K** | 65.73 | 71.97 |
| **Faro-Yi-9B** | **68.80** | **73.28** | |