File size: 1,148 Bytes
1e1d2d0
 
d52e556
091f367
d52e556
 
 
 
 
 
 
 
 
091f367
d52e556
091f367
1e1d2d0
091f367
 
 
 
9381286
091f367
8f3fc48
 
9381286
091f367
 
 
 
 
a203234
 
 
 
 
091f367
 
 
 
 
 
 
536b0cc
 
 
faddfbf
536b0cc
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: cc-by-sa-4.0

language: 
- de
- en
- es
- fr
- it
- ja
- ru
- uk

tags:
- translation

---

# TakoMT

This is a translation model using Marian-NMT.
For more details, please see [my repository](https://github.com/s-taka/fugumt).

In addition to the data listed in the repository I also used [ParaCrawl](https://paracrawl.eu/).

* source languages: de, en, es, fr, it, ru, uk 
* target language: ja 


### How to use

This model uses transformers and sentencepiece.
```python
!pip install transformers sentencepiece
```

You can use this model directly with a pipeline:

```python
from transformers import pipeline
tako_translator = pipeline('translation', model='staka/takomt')
tako_translator('This is a cat.')
```

### Eval results

The results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows:

|source |target |BLEU(*1)| 
|-------|-------|--------|
|de     |ja     |27.8    |
|en     |ja     |28.4    |
|es     |ja     |32.0    |
|fr     |ja     |27.9    |
|it     |ja     |24.3    |
|ru     |ja     |27.3    |
|uk     |ja     |29.8    |


(*1) sacrebleu --tokenize ja-mecab