jieliu commited on
Commit
48ddfee
1 Parent(s): 43ecab1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -15
README.md CHANGED
@@ -1,16 +1,19 @@
1
- \---
2
-
3
  license: apache-2.0
 
 
 
 
 
 
 
 
4
 
5
- \---
6
-
7
- ### Storm-7B
8
-
9
- > **Developed by**: [Jie Liu](https://jieliu.site/)$^{*1,2}$, [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN)$^{*2}$, [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN)$^{2}$, [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN)$^{2}$, and [Wanli Ouyang](https://wlouyang.github.io/)$^{1,2}$.
10
- >
11
- > $^{1}$MMLab, The Chinese University of Hong Kong $^{2}$Shanghai AI Laboratory
12
 
13
- #### Introduction
14
 
15
  We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the [AlpacaEval 2.0](https://tatsu-lab.github.io/alpaca_eval/) leaderboard, ranking 3rd in length-controlled win rate.
16
 
@@ -45,7 +48,7 @@ We also conducted preliminary evaluations on other benchmarks and observed no si
45
  | Mistral-7B-v0.1 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 65.59 |
46
  | Qwen-7b | 51.37 | 78.47 | 59.84 | 47.79 | 72.69 | 62.03 |
47
 
48
- #### Uses
49
 
50
  Our model uses the same chat template as [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106). A sample code snippet for inference using our model is provided below.
51
 
@@ -78,11 +81,11 @@ response_text = generate_response(input_prompt)
78
  print("Response:", response_text)
79
  ```
80
 
81
- #### Limitations
82
 
83
  Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
84
 
85
- #### Citation
86
 
87
  ```
88
  @misc{liu2024storm,
@@ -92,5 +95,4 @@ Storm-7B is a quick demonstration that a language model, fine-tuned with AI feed
92
  month = {April},
93
  year = {2024}
94
  }
95
- ```
96
-
 
1
+ ---
 
2
  license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - storm
6
+ - mistral
7
+ - openchat
8
+ - RLAIF
9
+ - reward model
10
+ ---
11
 
12
+ # Storm-7B
13
+ - **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{2}\\), and [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
14
+ - \\(^{1}\\)MMLab, The Chinese University of Hong Kong   \\(^{2}\\)Shanghai AI Laboratory
 
 
 
 
15
 
16
+ ## Introduction
17
 
18
  We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the [AlpacaEval 2.0](https://tatsu-lab.github.io/alpaca_eval/) leaderboard, ranking 3rd in length-controlled win rate.
19
 
 
48
  | Mistral-7B-v0.1 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 65.59 |
49
  | Qwen-7b | 51.37 | 78.47 | 59.84 | 47.79 | 72.69 | 62.03 |
50
 
51
+ ## Uses
52
 
53
  Our model uses the same chat template as [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106). A sample code snippet for inference using our model is provided below.
54
 
 
81
  print("Response:", response_text)
82
  ```
83
 
84
+ ## Limitations
85
 
86
  Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
87
 
88
+ ## Citation
89
 
90
  ```
91
  @misc{liu2024storm,
 
95
  month = {April},
96
  year = {2024}
97
  }
98
+ ```