File size: 1,101 Bytes
78aa4ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# English to Igbo
Author: iroro orife
## Data
- The JW300 English-Igbo dataset.
## Model
- Default Masakhane Transformer translation model.
- [Link to google drive folder with models](https://drive.google.com/drive/folders/1bVPKPkaivIT9k23ydbSlVj3Qwd3GJZf0)
## Analysis
The dataset requires more preprocessing to remove special characters and Scripture chapters/verse names & figures.
One very nice aspect of the Igbo translations are the proper tonal and orthographic diacritic forms predicted by
the model. This is not a feature that is available with Google Translate!
Example 1
```sh
Source: It’s not about the alcohol .
Reference: Nsogbu ya abụghị na ịṅụ mmanya na - aba n’anya na - agụ ya .
Hypothesis: Ọ bụghị banyere mmanya na - aba n’anya .
```
Example 2
```sh
Source: Is this also the case with your neighborhood ?
Reference: Ọ̀ bụ otú a ka ọ dịkwa n’agbata obi gị ?
Hypothesis: Nke a ọ̀ bụkwa ihe banyere ndị agbata obi gị ?
```
# Results
Tokenization | BLEU dev | BLEU test
--- | --- | ---
BPE| 33.51 | 34.85
|