File size: 2,173 Bytes
78aa4ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# English to Arabic
Author:
* Abdallah Bashir
* Amr Muhammad ALAMEEN Khalifa
## Data
* The JW300 English-Arabic (bin) dataset.
* The [TED-Multilingual-Parallel-Corpus](https://github.com/ajinkyakulkarni14/TED-Multilingual-Parallel-Corpus) English Arabic dataset
## Test Data
the test data files for evaluating the model was not taken from the repo like the rest of the baselines but instead taken as a portion from the total merged datasets and in hte same size of the entries in test.en-any.en.
## Model
- Default Masakhane Transformer translation model.
- [Link to google drive folder with models](https://drive.google.com/drive/folders/18P6HH9wavVpaR3UufoiUsTeqMnkvc1He)
## Analysis
The dataset requires more preprocessing to remove special characters and Scripture chapters/verse names & figures. Also it is very small, which is the primary limiting factor on being able to learn anything useful.
Example 1
```ar
Source: at the same time , the police gave free passage to busloads of mkalavishviliโs followers , who were bent on destroying the convention site .
Reference: ููู ุงูููุช ููุณู โ ูุชุญุช ุงูุดุฑุทู ุงูุทุฑูู ูุจุงุตุงุช ุงุฎุฑู ุชููู ุงุชุจุงุน ู
ูุงูุงฺคูุดฺคููู ุงูุฐูู ูุงููุง ู
ุตู
ูู ุนูู ุชุฏู
ูุฑ ู
ููุน ุงูู
ุญูู โ
Hypothesis: ููู ุงูููุช ููุณู โ ุงุนุทู ุงูุดุฑุทู ู
ูุทุน ู
ุฌุงูู ููุซูุฑ ู
ู ุงุชุจุงุน ู
ุงููุงฺูคฺูููคฺูููคูู โ ุงูุฐูู ูุงููุง ู
ูุฒุนุฌูู ูู ุชุฏู
ูุฑ ู
ููุน ุงูู
ุญูู โ
```
Example 2
```sh
Source: a big attraction was the man roland lithoman web - offset press that prints up to 90,000 magazines an hour .
Reference: ูู
ุง ููุช ุงูุชุจุงู ุงูุฒูุงุฑ ุงูู ุญุฏ ูุจูุฑ ูู ู
ุทุจุนู ุงููุจ ุงููุณุช ุงูู
ุชุทูุฑู ุฌุฏุง โ man roland lithomanโ โ ุงูุชู ูู
ูู ุงู ุชุทุจุน ู โูฉู ู
ุฌูู ูู ุงูุณุงุนู โ
Hypothesis: ูุงู ุฌุฐุจ ูุจูุฑ ูู ุงูุตุญุงูู ุงูุฑูู
ุงููู ููุชูู
ุงุงู โ ุงูุชู ุชุทูู ุงูู ู โูฉู ู
ุฌูู ูู ุงูุณุงุนู โ
```
# Results
Tokenization | BLEU dev | BLEU test
--- | --- | ---
BPE | 15.45 | 9.28
|