Supported languages

#2
by grofte - opened

From page 13 of the paper:
image.png

And as text:
ISO NAME ISO NAME ISO NAME
af AFRIKAANS ht HAITIAN_CREOLE pt PORTUGUESE
am AMHARIC hu HUNGARIAN ro ROMANIAN
ar ARABIC hy ARMENIAN ru RUSSIAN
as ASSAMESE id INDONESIAN rw KINYARWANDA
az AZERBAIJANI ig IGBO si SINHALESE
be BELARUSIAN is ICELANDIC sk SLOVAK
bg BULGARIAN it ITALIAN sl SLOVENIAN
bn BENGALI ja JAPANESE sm SAMOAN
bo TIBETAN jv JAVANESE sn SHONA
bs BOSNIAN ka GEORGIAN so SOMALI
ca CATALAN kk KAZAKH sq ALBANIAN
ceb CEBUANO km KHMER sr SERBIAN
co CORSICAN kn KANNADA st SESOTHO
cs CZECH ko KOREAN su SUNDANESE
cy WELSH ku KURDISH sv SWEDISH
da DANISH ky KYRGYZ sw SWAHILI
de GERMAN la LATIN ta TAMIL
el GREEK lb LUXEMBOURGISH te TELUGU
en ENGLISH lo LAOTHIAN tg TAJIK
eo ESPERANTO lt LITHUANIAN th THAI
es SPANISH lv LATVIAN tk TURKMEN
et ESTONIAN mg MALAGASY tl TAGALOG
eu BASQUE mi MAORI tr TURKISH
fa PERSIAN mk MACEDONIAN tt TATAR
fi FINNISH ml MALAYALAM ug UIGHUR
fr FRENCH mn MONGOLIAN uk UKRAINIAN
fy FRISIAN mr MARATHI ur URDU
ga IRISH ms MALAY uz UZBEK
gd SCOTS_GAELIC mt MALTESE vi VIETNAMESE
gl GALICIAN my BURMESE wo WOLOF
gu GUJARATI ne NEPALI xh XHOSA
ha HAUSA nl DUTCH yi YIDDISH
haw HAWAIIAN no NORWEGIAN yo YORUBA
he HEBREW ny NYANJA zh CHINESE
hi HINDI or ORIYA zu ZULU
hmn HMONG pa PUNJABI
hr CROATIAN pl POLISH

I don't know if this means that they excluded the two major forms of Norweigian 'nn' and 'nb' or if they mapped everything labelled as either to 'no'. I don't know if they added spaces to languages that don't really use white space like Thai and Japanese.

Edit: The number of sentences are in supplemental figure 7 and Norweigian has almost as many examples as Danish so the data is there, I'm just not sure which kind of Norweigian we are talking about.

As a dictionary

{'af': 'AFRIKAANS',
 'sq': 'ALBANIAN',
 'am': 'AMHARIC',
 'ar': 'ARABIC',
 'hy': 'ARMENIAN',
 'as': 'ASSAMESE',
 'az': 'AZERBAIJANI',
 'eu': 'BASQUE',
 'be': 'BELARUSIAN',
 'bn': 'BENGALI',
 'bs': 'BOSNIAN',
 'bg': 'BULGARIAN',
 'my': 'BURMESE',
 'ca': 'CATALAN',
 'ceb': 'CEBUANO',
 'zh': 'CHINESE',
 'co': 'CORSICAN',
 'hr': 'CROATIAN',
 'cs': 'CZECH',
 'da': 'DANISH',
 'nl': 'DUTCH',
 'en': 'ENGLISH',
 'eo': 'ESPERANTO',
 'et': 'ESTONIAN',
 'fi': 'FINNISH',
 'fr': 'FRENCH',
 'fy': 'FRISIAN',
 'gl': 'GALICIAN',
 'ka': 'GEORGIAN',
 'de': 'GERMAN',
 'el': 'GREEK',
 'gu': 'GUJARATI',
 'ht': 'HAITIAN_CREOLE',
 'ha': 'HAUSA',
 'haw': 'HAWAIIAN',
 'he': 'HEBREW',
 'hi': 'HINDI',
 'hmn': 'HMONG',
 'hu': 'HUNGARIAN',
 'is': 'ICELANDIC',
 'ig': 'IGBO',
 'id': 'INDONESIAN',
 'ga': 'IRISH',
 'it': 'ITALIAN',
 'ja': 'JAPANESE',
 'jv': 'JAVANESE',
 'kn': 'KANNADA',
 'kk': 'KAZAKH',
 'km': 'KHMER',
 'rw': 'KINYARWANDA',
 'ko': 'KOREAN',
 'ku': 'KURDISH',
 'ky': 'KYRGYZ',
 'lo': 'LAOTHIAN',
 'la': 'LATIN',
 'lv': 'LATVIAN',
 'lt': 'LITHUANIAN',
 'lb': 'LUXEMBOURGISH',
 'mk': 'MACEDONIAN',
 'mg': 'MALAGASY',
 'ms': 'MALAY',
 'ml': 'MALAYALAM',
 'mt': 'MALTESE',
 'mi': 'MAORI',
 'mr': 'MARATHI',
 'mn': 'MONGOLIAN',
 'ne': 'NEPALI',
 'no': 'NORWEGIAN',
 'ny': 'NYANJA',
 'or': 'ORIYA',
 'fa': 'PERSIAN',
 'pl': 'POLISH',
 'pt': 'PORTUGUESE',
 'pa': 'PUNJABI',
 'ro': 'ROMANIAN',
 'ru': 'RUSSIAN',
 'sm': 'SAMOAN',
 'gd': 'SCOTS_GAELIC',
 'sr': 'SERBIAN',
 'st': 'SESOTHO',
 'sn': 'SHONA',
 'si': 'SINHALESE',
 'sk': 'SLOVAK',
 'sl': 'SLOVENIAN',
 'so': 'SOMALI',
 'es': 'SPANISH',
 'su': 'SUNDANESE',
 'sw': 'SWAHILI',
 'sv': 'SWEDISH',
 'tl': 'TAGALOG',
 'tg': 'TAJIK',
 'ta': 'TAMIL',
 'tt': 'TATAR',
 'te': 'TELUGU',
 'th': 'THAI',
 'bo': 'TIBETAN',
 'tr': 'TURKISH',
 'tk': 'TURKMEN',
 'ug': 'UIGHUR',
 'uk': 'UKRAINIAN',
 'ur': 'URDU',
 'uz': 'UZBEK',
 'vi': 'VIETNAMESE',
 'cy': 'WELSH',
 'wo': 'WOLOF',
 'xh': 'XHOSA',
 'yi': 'YIDDISH',
 'yo': 'YORUBA',
 'zu': 'ZULU'}

Sign up or log in to comment