redrussianarmy commited on
Commit
ccf68f4
1 Parent(s): 76f7408

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -2
README.md CHANGED
@@ -1,2 +1,35 @@
1
- In this repository we release (yet another) GPT-2 model, that was trained on various texts for German.
2
- In this repository, I release pretrained GPT-2 model, that was trained on various texts for Turkish
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Turkish GPT-2 Model
2
+
3
+ In this repository I release GPT-2 model, that was trained on various texts for Turkish.
4
+
5
+ The model is meant to be an entry point for fine-tuning on other texts.
6
+
7
+ # Training corpora
8
+
9
+ I used a Turkish corpora that is taken from oscar-corpus.
10
+
11
+ It was possible to create byte-level BPE with Tokenizers library of Huggingface.
12
+
13
+ With the Tokenizers library, I created a 52K byte-level BPE vocab based on the training corpora.
14
+
15
+ After creating the vocab, I could train the GPT-2 for Turkish on two 2080TI over the complete training corpus (five epochs).
16
+
17
+ # Using the model
18
+
19
+ The model itself can be used in this way:
20
+
21
+ ``` python
22
+ from transformers import AutoTokenizer, AutoModelWithLMHead
23
+ tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")
24
+ model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")
25
+ ```
26
+
27
+ Here's an example that shows how to use the great Transformers Pipelines for generating text:
28
+
29
+ ``` python
30
+ from transformers import pipeline
31
+ pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased",
32
+ tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length':800})
33
+ text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
34
+ print(text)
35
+ ```