burakaytan commited on
Commit
ffd6166
1 Parent(s): 04a2ff4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - tr
5
  ---
6
+ 🇹🇷 RoBERTaTurkish
7
+
8
+ ## Model description
9
+ This is a Turkish RoBERTa base model pretrained on Turkish Wikipedia, Turkish OSCAR, and some news websites.
10
+
11
+ The final training corpus has a size of 38 GB and 329.720.508 sentences.
12
+
13
+ As Turkcell, we trained the model on an Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz with 256GB RAM and 2 x GV100GL [Tesla V100 PCIe 32GB] GPU for 2.5M steps.
14
+
15
+ # Usage
16
+ Load transformers library with:
17
+ ```python
18
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained("burakaytan/roberta-base-turkish-uncased")
21
+ model = AutoModelForMaskedLM.from_pretrained("burakaytan/roberta-base-turkish-uncased")
22
+ ```
23
+
24
+
25
+ # Fill Mask Usage
26
+
27
+ ```python
28
+ from transformers import pipeline
29
+
30
+ fill_mask = pipeline(
31
+ "fill-mask",
32
+ model="burakaytan/roberta-base-turkish-uncased",
33
+ tokenizer="burakaytan/roberta-base-turkish-uncased"
34
+ )
35
+
36
+ fill_mask("iki ülke arasında <mask> başladı")
37
+
38
+ [{'sequence': 'iki ülke arasında savaş başladı',
39
+ 'score': 0.3013845384120941,
40
+ 'token': 1359,
41
+ 'token_str': ' savaş'},
42
+ {'sequence': 'iki ülke arasında müzakereler başladı',
43
+ 'score': 0.1058429479598999,
44
+ 'token': 30439,
45
+ 'token_str': ' müzakereler'},
46
+ {'sequence': 'iki ülke arasında görüşmeler başladı',
47
+ 'score': 0.07718811184167862,
48
+ 'token': 4916,
49
+ 'token_str': ' görüşmeler'},
50
+ {'sequence': 'iki ülke arasında kriz başladı',
51
+ 'score': 0.07174749672412872,
52
+ 'token': 3908,
53
+ 'token_str': ' kriz'},
54
+ {'sequence': 'iki ülke arasında çatışmalar başladı',
55
+ 'score': 0.05678590387105942,
56
+ 'token': 19346,
57
+ 'token_str': ' çatışmalar'}]
58
+ ```