--- license: cc-by-4.0 language: - nso - en pipeline_tag: text2text-generation tags: - m2m100 - translation - africanlp - african - sepedi - northern-sotho --- # [nso-en] Northen Sotho [Sepedi] to English Translation Model based on M2M100 and The South African Gov-ZA multilingual corpus Model created from Northen Sotho [Sepedi] to English aligned sentences from [The South African Gov-ZA multilingual corpus](https://github.com/dsfsi/gov-za-multilingual) The data set contains cabinet statements from the South African government, maintained by the Government Communication and Information System (GCIS). Data was scraped from the governments website: https://www.gov.za/cabinet-statements ## Authors - Vukosi Marivate - [@vukosi](https://twitter.com/vukosi) - Matimba Shingange - Richard Lastrucci - Isheanesu Joseph Dzingirai - Jenalea Rajab ## BibTeX entry and citation info ``` @inproceedings{lastrucci-etal-2023-preparing, title = "Preparing the Vuk{'}uzenzele and {ZA}-gov-multilingual {S}outh {A}frican multilingual corpora", author = "Richard Lastrucci and Isheanesu Dzingirai and Jenalea Rajab and Andani Madodonga and Matimba Shingange and Daniel Njini and Vukosi Marivate", booktitle = "Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.rail-1.3", pages = "18--25" } ``` [Paper - Preparing the Vuk'uzenzele and ZA-gov-multilingual South African multilingual corpora](https://arxiv.org/abs/2303.03750)