--- library_name: transformers license: cc-by-nc-4.0 language: - az pipeline_tag: token-classification tags: - NER - Named Entity Recognition widget: - text: >- İyunun 11-i saat 20:55 radələrində Oğuz rayonu Tayıflı, Şirvanlı, Xalxal kəndlərinə diametri 10 mm olan dolu düşüb. datasets: - LocalDoc/azerbaijani-ner-dataset --- # Azerbaijani Named Entity Recognition (NER) Model This repository contains the code and model for Named Entity Recognition (NER) in Azerbaijani language. The model is built using the XLM-RoBERTa architecture and fine-tuned on a custom dataset. ## Model Description The model recognizes the following entity types: - LABEL_0: **O**: Outside any named entity - LABEL_1: **PERSON**: Names of individuals - LABEL_2 :**LOCATION**: Geographical locations, both man-made and natural - LABEL_3 :**ORGANISATION**: Names of companies, institutions - LABEL_4 :**DATE**: Dates or periods - LABEL_5 :**TIME**: Times of the day - LABEL_6 :**MONEY**: Monetary values - LABEL_7 :**PERCENTAGE**: Percentage values - LABEL_8 :**FACILITY**: Buildings, airports, etc. - LABEL_9 :**PRODUCT**: Products and goods - LABEL_10 :**EVENT**: Events and occurrences - LABEL_11 :**ART**: Artworks, titles of books, songs - LABEL_12 :**LAW**: Legal documents - LABEL_13 :**LANGUAGE**: Languages - LABEL_14 :**GPE**: Countries, cities, states - LABEL_15 :**NORP**: Nationalities or religious or political groups - LABEL_16 :**ORDINAL**: Ordinal numbers - LABEL_17 :**CARDINAL**: Cardinal numbers - LABEL_18 :**DISEASE**: Diseases and medical conditions - LABEL_19 :**CONTACT**: Contact information, e.g., phone numbers, emails - LABEL_20 :**ADAGE**: Proverbs, sayings - LABEL_21 :**QUANTITY**: Measurements and quantities - LABEL_22 :**MISCELLANEOUS**: Miscellaneous entities - LABEL_23 :**POSITION**: Professional or social positions - LABEL_24 :**PROJECT**: Names of projects or programs ## Installation To use the model, you need to install the required libraries. You can do this using `pip`: ```bash pip install transformers pip install datasets ``` ```python from transformers import pipeline, XLMRobertaTokenizerFast, XLMRobertaForTokenClassification # Load the model and tokenizer tokenizer = XLMRobertaTokenizerFast.from_pretrained("LocalDoc/ner_azerbaijan") model = XLMRobertaForTokenClassification.from_pretrained("LocalDoc/ner_azerbaijan") # Create NER pipeline nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # Example text example = "Komitədən bildirilib ki, sovet dövründə Azərbaycanda cəmi 17 məscid fəaliyyət göstərirdisə, dövlət müstəqilliyinin bərpasından sonra ölkədə 814 məscid tikilib." # Perform NER ner_results = nlp(example) # Mapping of label indices to their descriptions label_mapping = { 0: "O", 1: "PERSON", 2: "LOCATION", 3: "ORGANISATION", 4: "DATE", 5: "TIME", 6: "MONEY", 7: "PERCENTAGE", 8: "FACILITY", 9: "PRODUCT", 10: "EVENT", 11: "ART", 12: "LAW", 13: "LANGUAGE", 14: "GPE", 15: "NORP", 16: "ORDINAL", 17: "CARDINAL", 18: "DISEASE", 19: "CONTACT", 20: "ADAGE", 21: "QUANTITY", 22: "MISCELLANEOUS", 23: "POSITION", 24: "PROJECT" } # Print results with mapped entity types for result in ner_results: entity_group = result['entity_group'] entity_description = label_mapping[int(entity_group.split('_')[-1])] print({ 'entity_group': entity_description, 'score': result['score'], 'word': result['word'], 'start': result['start'], 'end': result['end'] }) ``` ## License This model licensed under the CC BY-NC-ND 4.0 license. What does this license allow? Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. Non-Commercial: You may not use the material for commercial purposes. No Derivatives: If you remix, transform, or build upon the material, you may not distribute the modified material. For more information, please refer to the CC BY-NC-ND 4.0 license. ## Contact For more information, questions, or issues, please contact LocalDoc at [v.resad.89@gmail.com].