jtatman commited on
Commit
fc55243
1 Parent(s): 114a265

Filled out card

Browse files
Files changed (1) hide show
  1. README.md +13 -5
README.md CHANGED
@@ -13,21 +13,29 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # hh-rlhf
15
 
16
- This model is a fine-tuned version of [vicgalle/gpt2-open-instruct-v1](https://huggingface.co/vicgalle/gpt2-open-instruct-v1) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 2.1534
19
 
20
  ## Model description
21
 
22
- More information needed
23
-
24
  ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
  ## Training and evaluation data
29
 
30
- More information needed
 
 
 
 
 
 
 
 
 
31
 
32
  ## Training procedure
33
 
 
13
 
14
  # hh-rlhf
15
 
16
+ This model is a fine-tuned version of [vicgalle/gpt2-open-instruct-v1](https://huggingface.co/vicgalle/gpt2-open-instruct-v1) on an subset (15k) of the Anthropic/hh-rlhf dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 2.1534
19
 
20
  ## Model description
21
 
22
+ GPT2 open instruct was trained on the open-instruct dataset fully. The reimagines one LM head as a partial rhlf adapter, with subtle reinforcements.
 
23
  ## Intended uses & limitations
24
 
25
+ Intended to study the intersection of instruct models and prompting that focuses on subtle exchanges of prompting. This probably needs to be refined substantially at this point.
26
 
27
  ## Training and evaluation data
28
 
29
+ Train dataset size: 15000
30
+ Test dataset size: 500
31
+ Dataset({
32
+ features: ['chosen', 'rejected'],
33
+ num_rows: 15000
34
+ })
35
+ Dataset({
36
+ features: ['chosen', 'rejected'],
37
+ num_rows: 500
38
+ })
39
 
40
  ## Training procedure
41