Faro-Yi-9B / README.md
wenbopan's picture
rm metadata
9b0eaf4 verified
|
raw
history blame
3.37 kB
metadata
license: apache-2.0
datasets:
  - wenbopan/RefGPT-Fact-v2-8x
  - wenbopan/anti-haystack
  - wenbopan/OpenHermes-2.5-zh
language:
  - zh
  - en

image/jpeg

Fi-9B

Fi-9B is an improved Yi-9B-200K with extensive instruction tuning on Fusang-V1. Compared to Yi-9B-200K, Fi-9B has gained greater capability in various downstream tasks and long-context modeling thanks to the large-scale synthetic data in Fusang-V1.

Performance

Fi-9B enhances its ability compared to Yi-9B-200K in most dimensions, especially in long-range modeling and bilingual (English, Chinese) understanding. Fi is competitive among all open-sourced models at around 9B parameters. Fi-9B is good at both factual tasks and preferred by LLM-judges.

Fact-based Evaluation (Open LLM Leaderboard)

Metric MMLU GSM8K HellaSwag TruthfulQA Arc Winogrande
Yi-9B-200K 65.73 50.49 56.72 33.80 69.25 71.67
Fi-9B-200K 68.80 63.08 57.28 40.86 72.58 71.11

Long-context Modeling (LongBench)

Name Average_zh Average_en Code Completion
Yi-9B-200K 30.288 36.7071 72.2
Fi-9B-200K 41.092 40.9536 46.0
Score breakdown
Name Few-shot Learning_en Synthetic Tasks_en Single-Doc QA_en Multi-Doc QA_en Summarization_en Few-shot Learning_zh Synthetic Tasks_zh Single-Doc QA_zh Multi-Doc QA_zh Summarization_zh
Yi-9B-200K 60.6 22.8 30.9 38.9 25.8 46.5 28.0 49.6 17.7 9.7
Fi-9B-200K 63.8 40.2 36.2 38.0 26.3 30.0 75.1 55.6 30.7 14.1

Bilingual Ability (CMMLU & MMLU)

Name MMLU CMMLU
Yi-9B-200K 65.73 71.97
Fi-9B-200K 68.80 73.28

Current Limitations

  • This version of Fi-9B may not be able to stop generation in some scenarios. I will fix that soon.
  • Compared to the original Yi-9B-200K, Fi-9B has degraded ability for code completion. This may be due to the lack of raw code data during instruction tuning.