benjamin
/

gerpt2-large

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

benjamin commited on Dec 27, 2020

Commit

5b7a90a

•

1 Parent(s): 1ec98f3

max_length -> 100

Files changed (2) hide show

README.md +5 -3
config.json +1 -1

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ license: mit
 A large German GPT2.
 See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
 ## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
@@ -61,8 +63,8 @@ print(tokenizer.decode(output))
 ## Training details
-GerPT2 is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
-GerPT2 was trained with:
 - a batch size of 256
 - using OneCycle learning rate with a maximum of 5e-3
@@ -71,7 +73,7 @@ GerPT2 was trained with:
 Training took roughly 12 days on 8 TPUv3 cores.
-To train GerPT2, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
 0. Download and unzip training data from http://data.statmt.org/cc-100/.
 1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.

 A large German GPT2.
+Also check out [GerPT2](https://huggingface.co/benjamin/gerpt2), a small version of this model.
 See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
 ## Comparison to [dbmdz/german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
 ## Training details
+GerPT2-large is trained on the entire German data (67GB) from the [CC-100 Corpus](http://data.statmt.org/cc-100/) and weights were initialized from the [English GPT2 model](https://huggingface.co/gpt2-large).
+GerPT2-large was trained with:
 - a batch size of 256
 - using OneCycle learning rate with a maximum of 5e-3
 Training took roughly 12 days on 8 TPUv3 cores.
+To train GerPT2-large, follow these steps. Scripts are located in the [Github repository](https://github.com/bminixhofer/gerpt2):
 0. Download and unzip training data from http://data.statmt.org/cc-100/.
 1. Train a tokenizer using `prepare/train_tokenizer.py`. As training data for the tokenizer I used a random subset of 5% of the CC-100 data.

config.json CHANGED Viewed

@@ -32,7 +32,7 @@
   "task_specific_params": {
     "text-generation": {
       "do_sample": true,
-      "max_length": 500
     }
   },
   "vocab_size": 50257

   "task_specific_params": {
     "text-generation": {
       "do_sample": true,
+      "max_length": 100
     }
   },
   "vocab_size": 50257