# Turkish GPT-2 Model In this repository I release GPT-2 model, that was trained on various texts for Turkish. The model is meant to be an entry point for fine-tuning on other texts. ## Training corpora I used a Turkish corpora that is taken from oscar-corpus. It was possible to create byte-level BPE with Tokenizers library of Huggingface. With the Tokenizers library, I created a 52K byte-level BPE vocab based on the training corpora. After creating the vocab, I could train the GPT-2 for Turkish on two 2080TI over the complete training corpus (five epochs). ## Model weights Both PyTorch and Tensorflow compatible weights are available. | Model | Downloads | --------------------------------- | --------------------------------------------------------------------------------------------------------------- | `dbmdz/bert-base-turkish-cased` | [`config.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/config.json) • [`merges.txt`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/merges.txt) • [`pytorch_model.bin`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/pytorch_model.bin) • [`special_tokens_map.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/special_tokens_map.json) • [`tf_model.h5`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/tf_model.h5) • [`tokenizer_config.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/tokenizer_config.json) • [`traning_args.bin`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/training_args.bin) • [`vocab.json`](https://huggingface.co/redrussianarmy/gpt2-turkish-cased/blob/main/vocab.json) ## Using the model The model itself can be used in this way: ``` python from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased") model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased") ``` Here's an example that shows how to use the great Transformers Pipelines for generating text: ``` python from transformers import pipeline pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased", tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length':800}) text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"] print(text) ``` ## Contact (Bugs, Feedback, Contribution and more) For questions about the GPT2-Turkish model, just open an issue [here](https://github.com/redrussianarmy/gpt2-turkish/issues) 🤗