File size: 3,374 Bytes
cd9bf63 7ee38f3 61b03c4 7ee38f3 cd9bf63 7ee38f3 85c792b 1cff19b 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 837b0fd 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 5794441 7ee38f3 837b0fd 7ee38f3 5794441 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
license: apache-2.0
datasets:
- wenbopan/RefGPT-Fact-v2-8x
- wenbopan/anti-haystack
- wenbopan/OpenHermes-2.5-zh
language:
- zh
- en
---
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/62cd3a3691d27e60db0698b0/2peGbPRq4jE-OoS9ndkOx.jpeg)
# Fi-9B
Fi-9B is an improved [Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K) with extensive instruction tuning on [Fusang-V1](https://huggingface.co/datasets/wenbopan/Fusang-v1). Compared to Yi-9B-200K, Fi-9B has gained greater capability in various downstream tasks and long-context modeling thanks to the large-scale synthetic data in Fusang-V1.
## Performance
Fi-9B enhances its ability compared to Yi-9B-200K in most dimensions, especially in long-range modeling and bilingual (English, Chinese) understanding. Fi is competitive among all open-sourced models at around 9B parameters. Fi-9B is good at both factual tasks and preferred by LLM-judges.
### Fact-based Evaluation (Open LLM Leaderboard)
| **Metric** | **MMLU** | GSM8K | **HellaSwag** | **TruthfulQA** | **Arc** | **Winogrande** |
| -------------- | --------- | --------- | ------------- | -------------- | ----------- | -------------- |
| **Yi-9B-200K** | 65.73 | 50.49 | 56.72 | 33.80 | 69.25 | 71.67 |
| **Fi-9B-200K** | **68.80** | **63.08** | **57.28** | **40.86** | **72.58** | 71.11 |
### Long-context Modeling (LongBench)
| **Name** | **Average_zh** | **Average_en** | **Code Completion** |
|----------------|----------------|----------------|---------------------|
| **Yi-9B-200K** | 30.288 | 36.7071 | 72.2 |
| **Fi-9B-200K** | **41.092** | **40.9536** | 46.0 |
<details>
<summary>Score breakdown</summary>
| **Name** | **Few-shot Learning_en** | **Synthetic Tasks_en** | **Single-Doc QA_en** | **Multi-Doc QA_en** | **Summarization_en** | **Few-shot Learning_zh** | **Synthetic Tasks_zh** | **Single-Doc QA_zh** | **Multi-Doc QA_zh** | **Summarization_zh** |
|----------------|--------------------------|------------------------|----------------------|---------------------|----------------------|--------------------------|------------------------|----------------------|---------------------|----------------------|
| **Yi-9B-200K** | 60.6 | 22.8 | 30.9 | 38.9 | 25.8 | 46.5 | 28.0 | 49.6 | 17.7 | 9.7 |
| **Fi-9B-200K** | **63.8** | **40.2** | **36.2** | 38.0 | **26.3** | 30.0 | **75.1** | **55.6** | **30.7** | **14.1** |
</details>
<!--### Performance on Preference TODO-->
### Bilingual Ability (CMMLU & MMLU)
| **Name** | MMLU | **CMMLU** |
| -------------- | --------- | --------- |
| **Yi-9B-200K** | 65.73 | 71.97 |
| **Fi-9B-200K** | **68.80** | **73.28** |
## Current Limitations
- This version of Fi-9B may not be able to stop generation in some scenarios. I will fix that soon.
- Compared to the original Yi-9B-200K, Fi-9B has degraded ability for code completion. This may be due to the lack of raw code data during instruction tuning. |