English to Igbo
Author: iroro orife
Data
- The JW300 English-Igbo dataset.
Model
- Default Masakhane Transformer translation model.
- Link to google drive folder with models
Analysis
The dataset requires more preprocessing to remove special characters and Scripture chapters/verse names & figures. One very nice aspect of the Igbo translations are the proper tonal and orthographic diacritic forms predicted by the model. This is not a feature that is available with Google Translate!
Example 1
Source: It’s not about the alcohol .
Reference: Nsogbu ya abụghị na ịṅụ mmanya na - aba n’anya na - agụ ya .
Hypothesis: Ọ bụghị banyere mmanya na - aba n’anya .
Example 2
Source: Is this also the case with your neighborhood ?
Reference: Ọ̀ bụ otú a ka ọ dịkwa n’agbata obi gị ?
Hypothesis: Nke a ọ̀ bụkwa ihe banyere ndị agbata obi gị ?
Results
Tokenization | BLEU dev | BLEU test |
---|---|---|
BPE | 33.51 | 34.85 |