benjamin commited on
Commit
5b7a90a
1 Parent(s): 1ec98f3

max_length -> 100

Browse files
Files changed (2) hide show
  1. README.md +5 -3
  2. config.json +1 -1
README.md CHANGED
@@ -11,6 +11,8 @@ license: mit
11
 
12
  A large German GPT2.
13
 
 
 
14
  See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
15
 
16
  ## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
@@ -61,8 +63,8 @@ print(tokenizer.decode(output))
61
 
62
  ## Training details
63
 
64
- GerPT2 is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
65
- GerPT2 was trained with:
66
 
67
  - a batch size of 256
68
  - using OneCycle learning rate with a maximum of 5e-3
@@ -71,7 +73,7 @@ GerPT2 was trained with:
71
 
72
  Training took roughly 12 days on 8 TPUv3 cores.
73
 
74
- To train GerPT2, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
75
 
76
  0. Download and unzip training data from http://data.statmt.org/cc-100/.
77
  1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.
 
11
 
12
  A large German GPT2.
13
 
14
+ Also check out [GerPT2](https://huggingface.co/benjamin/gerpt2), a small version of this model.
15
+
16
  See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
17
 
18
  ## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
 
63
 
64
  ## Training details
65
 
66
+ GerPT2-large is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
67
+ GerPT2-large was trained with:
68
 
69
  - a batch size of 256
70
  - using OneCycle learning rate with a maximum of 5e-3
 
73
 
74
  Training took roughly 12 days on 8 TPUv3 cores.
75
 
76
+ To train GerPT2-large, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
77
 
78
  0. Download and unzip training data from http://data.statmt.org/cc-100/.
79
  1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.
config.json CHANGED
@@ -32,7 +32,7 @@
32
  "task_specific_params": {
33
  "text-generation": {
34
  "do_sample": true,
35
- "max_length": 500
36
  }
37
  },
38
  "vocab_size": 50257
 
32
  "task_specific_params": {
33
  "text-generation": {
34
  "do_sample": true,
35
+ "max_length": 100
36
  }
37
  },
38
  "vocab_size": 50257