|
# English to Arabic |
|
|
|
Author: |
|
* Abdallah Bashir |
|
* Amr Muhammad ALAMEEN Khalifa |
|
|
|
## Data |
|
|
|
* The JW300 English-Arabic (bin) dataset. |
|
* The [TED-Multilingual-Parallel-Corpus](https://github.com/ajinkyakulkarni14/TED-Multilingual-Parallel-Corpus) English Arabic dataset |
|
|
|
## Test Data |
|
the test data files for evaluating the model was not taken from the repo like the rest of the baselines but instead taken as a portion from the total merged datasets and in hte same size of the entries in test.en-any.en. |
|
|
|
## Model |
|
|
|
- Default Masakhane Transformer translation model. |
|
- [Link to google drive folder with models](https://drive.google.com/drive/folders/18P6HH9wavVpaR3UufoiUsTeqMnkvc1He) |
|
|
|
## Analysis |
|
|
|
The dataset requires more preprocessing to remove special characters and Scripture chapters/verse names & figures. Also it is very small, which is the primary limiting factor on being able to learn anything useful. |
|
|
|
Example 1 |
|
```ar |
|
Source: at the same time , the police gave free passage to busloads of mkalavishviliโs followers , who were bent on destroying the convention site . |
|
Reference: ููู ุงูููุช ููุณู โ ูุชุญุช ุงูุดุฑุทู ุงูุทุฑูู ูุจุงุตุงุช ุงุฎุฑู ุชููู ุงุชุจุงุน ู
ูุงูุงฺคูุดฺคููู ุงูุฐูู ูุงููุง ู
ุตู
ูู ุนูู ุชุฏู
ูุฑ ู
ููุน ุงูู
ุญูู โ |
|
Hypothesis: ููู ุงูููุช ููุณู โ ุงุนุทู ุงูุดุฑุทู ู
ูุทุน ู
ุฌุงูู ููุซูุฑ ู
ู ุงุชุจุงุน ู
ุงููุงฺูคฺูููคฺูููคูู โ ุงูุฐูู ูุงููุง ู
ูุฒุนุฌูู ูู ุชุฏู
ูุฑ ู
ููุน ุงูู
ุญูู โ |
|
``` |
|
|
|
Example 2 |
|
```sh |
|
Source: a big attraction was the man roland lithoman web - offset press that prints up to 90,000 magazines an hour . |
|
Reference: ูู
ุง ููุช ุงูุชุจุงู ุงูุฒูุงุฑ ุงูู ุญุฏ ูุจูุฑ ูู ู
ุทุจุนู ุงููุจ ุงููุณุช ุงูู
ุชุทูุฑู ุฌุฏุง โ man roland lithomanโ โ ุงูุชู ูู
ูู ุงู ุชุทุจุน ู โูฉู ู
ุฌูู ูู ุงูุณุงุนู โ |
|
Hypothesis: ูุงู ุฌุฐุจ ูุจูุฑ ูู ุงูุตุญุงูู ุงูุฑูู
ุงููู ููุชูู
ุงุงู โ ุงูุชู ุชุทูู ุงูู ู โูฉู ู
ุฌูู ูู ุงูุณุงุนู โ |
|
``` |
|
|
|
# Results |
|
|
|
Tokenization | BLEU dev | BLEU test |
|
--- | --- | --- |
|
BPE | 15.45 | 9.28 |
|
|