metadata
license: apache-2.0
datasets:
- wenbopan/RefGPT-Fact-v2-8x
- wenbopan/anti-haystack
- wenbopan/OpenHermes-2.5-zh
language:
- zh
- en
Fi-9B
Fi-9B is an improved Yi-9B-200K with extensive instruction tuning on Fusang-V1. Compared to Yi-9B-200K, Fi-9B has gained greater capability in various downstream tasks and long-context modeling thanks to the large-scale synthetic data in Fusang-V1.
Performance
Fi-9B enhances its ability compared to Yi-9B-200K in most dimensions, especially in long-range modeling and bilingual (English, Chinese) understanding. Fi is competitive among all open-sourced models at around 9B parameters. Fi-9B is good at both factual tasks and preferred by LLM-judges.
Fact-based Evaluation (Open LLM Leaderboard)
Metric | MMLU | GSM8K | HellaSwag | TruthfulQA | Arc | Winogrande |
---|---|---|---|---|---|---|
Yi-9B-200K | 65.73 | 50.49 | 56.72 | 33.80 | 69.25 | 71.67 |
Fi-9B-200K | 68.80 | 63.08 | 57.28 | 40.86 | 72.58 | 71.11 |
Long-context Modeling (LongBench)
Name | Average_zh | Average_en | Code Completion |
---|---|---|---|
Yi-9B-200K | 30.288 | 36.7071 | 72.2 |
Fi-9B-200K | 41.092 | 40.9536 | 46.0 |
Score breakdown
Name | Few-shot Learning_en | Synthetic Tasks_en | Single-Doc QA_en | Multi-Doc QA_en | Summarization_en | Few-shot Learning_zh | Synthetic Tasks_zh | Single-Doc QA_zh | Multi-Doc QA_zh | Summarization_zh |
---|---|---|---|---|---|---|---|---|---|---|
Yi-9B-200K | 60.6 | 22.8 | 30.9 | 38.9 | 25.8 | 46.5 | 28.0 | 49.6 | 17.7 | 9.7 |
Fi-9B-200K | 63.8 | 40.2 | 36.2 | 38.0 | 26.3 | 30.0 | 75.1 | 55.6 | 30.7 | 14.1 |
Bilingual Ability (CMMLU & MMLU)
Name | MMLU | CMMLU |
---|---|---|
Yi-9B-200K | 65.73 | 71.97 |
Fi-9B-200K | 68.80 | 73.28 |
Current Limitations
- This version of Fi-9B may not be able to stop generation in some scenarios. I will fix that soon.
- Compared to the original Yi-9B-200K, Fi-9B has degraded ability for code completion. This may be due to the lack of raw code data during instruction tuning.