max_length -> 100
Browse files- README.md +5 -3
- config.json +1 -1
README.md
CHANGED
@@ -11,6 +11,8 @@ license: mit
|
|
11 |
|
12 |
A large German GPT2.
|
13 |
|
|
|
|
|
14 |
See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
|
15 |
|
16 |
## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
|
@@ -61,8 +63,8 @@ print(tokenizer.decode(output))
|
|
61 |
|
62 |
## Training details
|
63 |
|
64 |
-
GerPT2 is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
|
65 |
-
GerPT2 was trained with:
|
66 |
|
67 |
- a batch size of 256
|
68 |
- using OneCycle learning rate with a maximum of 5e-3
|
@@ -71,7 +73,7 @@ GerPT2 was trained with:
|
|
71 |
|
72 |
Training took roughly 12 days on 8 TPUv3 cores.
|
73 |
|
74 |
-
To train GerPT2, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
|
75 |
|
76 |
0. Download and unzip training data from http://data.statmt.org/cc-100/.
|
77 |
1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.
|
|
|
11 |
|
12 |
A large German GPT2.
|
13 |
|
14 |
+
Also check out [GerPT2](https://huggingface.co/benjamin/gerpt2), a small version of this model.
|
15 |
+
|
16 |
See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
|
17 |
|
18 |
## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
|
|
|
63 |
|
64 |
## Training details
|
65 |
|
66 |
+
GerPT2-large is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
|
67 |
+
GerPT2-large was trained with:
|
68 |
|
69 |
- a batch size of 256
|
70 |
- using OneCycle learning rate with a maximum of 5e-3
|
|
|
73 |
|
74 |
Training took roughly 12 days on 8 TPUv3 cores.
|
75 |
|
76 |
+
To train GerPT2-large, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
|
77 |
|
78 |
0. Download and unzip training data from http://data.statmt.org/cc-100/.
|
79 |
1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.
|
config.json
CHANGED
@@ -32,7 +32,7 @@
|
|
32 |
"task_specific_params": {
|
33 |
"text-generation": {
|
34 |
"do_sample": true,
|
35 |
-
"max_length":
|
36 |
}
|
37 |
},
|
38 |
"vocab_size": 50257
|
|
|
32 |
"task_specific_params": {
|
33 |
"text-generation": {
|
34 |
"do_sample": true,
|
35 |
+
"max_length": 100
|
36 |
}
|
37 |
},
|
38 |
"vocab_size": 50257
|