--- datasets: - SLPG/Punjabi_Transliteration_Corpus language: - pa metrics: - bleu - cer library_name: fairseq pipeline_tag: translation tags: - punjabi shahmukhi - punjabi gurmukhi - transliteration - punjabi transliteration - punjabi gur to shahmukhi - transliteration system - punjabi transliteration system --- ### Punjabi Gurmukhi to Shahmukhi Transliteration System Our supervised Punjabi transliteration systems built using unsupervised corpus are bidirectional NMT systems which effectively convert text between Gurmukhi and Shahmukhi scripts. The Gurmukhi-to-Shahmukhi model achieves a 98.1 BLEU score and 99.5% word-level accuracy, while the Shahmukhi-to-Gurmukhi model scores 87.7 BLEU. ## Corpus Details - **Total Sentences:** 6.3 million - **Domains Covered:** Various domains including CCaligned, ccmatrix, TED, QED, OPUS, TIco, Wikimedia, Multicclaigned, Emille, IJCNLP, xlent, and paracrawl. - **Test Corpus:** FLORES-101 ### Model Details - **BLEU Score:** 98.1 - **Word-level Accuracy:** 99.5% - **Character Error Rate (CER):** 99.1% You may also explore our Shahmukhi-to-Gurmukhi Model with **BLEU Score** of 87.7 [here](https://huggingface.co/SLPG/Punjabi_Shahmukhi_to_Gurmukhi_Transliteration/). ## Usage These resources are intended to facilitate research and development in the field of Punjabi transliteration. They can be used to train new models or improve existing ones, enabling high-quality transliteration between Gurmukhi and Shahmukhi scripts. ## Citation **If you use our model, kindly cite our [paper]()**: ``` @article{Shehzadi2024, title={Unsupervised Punjabi Corpus and Neural Machine Transliteration System}, author={Shehzadi Ambreen, Sadaf Abdul Rauf, MG Abbas Malik and Muhammad Imran }, journal={Heliyon}, year={2024}, note={Under review}  } ```