ner_azerbaijan / README.md

Update README.md

c104daa verified 5 months ago

4.33 kB

	---
	library_name: transformers
	license: cc-by-nc-4.0
	language:
	- az
	pipeline_tag: token-classification
	tags:
	- NER
	- Named Entity Recognition
	widget:
	- text: >-
	İyunun 11-i saat 20:55 radələrində Oğuz rayonu Tayıflı, Şirvanlı, Xalxal
	kəndlərinə diametri 10 mm olan dolu düşüb.
	datasets:
	- LocalDoc/azerbaijani-ner-dataset
	---

	# Azerbaijani Named Entity Recognition (NER) Model

	This repository contains the code and model for Named Entity Recognition (NER) in Azerbaijani language. The model is built using the XLM-RoBERTa architecture and fine-tuned on a custom dataset.

	## Model Description

	The model recognizes the following entity types:

	- LABEL_0: O: Outside any named entity
	- LABEL_1: PERSON: Names of individuals
	- LABEL_2 :LOCATION: Geographical locations, both man-made and natural
	- LABEL_3 :ORGANISATION: Names of companies, institutions
	- LABEL_4 :DATE: Dates or periods
	- LABEL_5 :TIME: Times of the day
	- LABEL_6 :MONEY: Monetary values
	- LABEL_7 :PERCENTAGE: Percentage values
	- LABEL_8 :FACILITY: Buildings, airports, etc.
	- LABEL_9 :PRODUCT: Products and goods
	- LABEL_10 :EVENT: Events and occurrences
	- LABEL_11 :ART: Artworks, titles of books, songs
	- LABEL_12 :LAW: Legal documents
	- LABEL_13 :LANGUAGE: Languages
	- LABEL_14 :GPE: Countries, cities, states
	- LABEL_15 :NORP: Nationalities or religious or political groups
	- LABEL_16 :ORDINAL: Ordinal numbers
	- LABEL_17 :CARDINAL: Cardinal numbers
	- LABEL_18 :DISEASE: Diseases and medical conditions
	- LABEL_19 :CONTACT: Contact information, e.g., phone numbers, emails
	- LABEL_20 :ADAGE: Proverbs, sayings
	- LABEL_21 :QUANTITY: Measurements and quantities
	- LABEL_22 :MISCELLANEOUS: Miscellaneous entities
	- LABEL_23 :POSITION: Professional or social positions
	- LABEL_24 :PROJECT: Names of projects or programs

	## Installation

	To use the model, you need to install the required libraries. You can do this using `pip`:

	```bash
	pip install transformers
	pip install datasets
	```
	```python
	from transformers import pipeline, XLMRobertaTokenizerFast, XLMRobertaForTokenClassification

	# Load the model and tokenizer
	tokenizer = XLMRobertaTokenizerFast.from_pretrained("LocalDoc/ner_azerbaijan")
	model = XLMRobertaForTokenClassification.from_pretrained("LocalDoc/ner_azerbaijan")

	# Create NER pipeline
	nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

	# Example text
	example = "Komitədən bildirilib ki, sovet dövründə Azərbaycanda cəmi 17 məscid fəaliyyət göstərirdisə, dövlət müstəqilliyinin bərpasından sonra ölkədə 814 məscid tikilib."

	# Perform NER
	ner_results = nlp(example)

	# Mapping of label indices to their descriptions
	label_mapping = {
	0: "O",
	1: "PERSON",
	2: "LOCATION",
	3: "ORGANISATION",
	4: "DATE",
	5: "TIME",
	6: "MONEY",
	7: "PERCENTAGE",
	8: "FACILITY",
	9: "PRODUCT",
	10: "EVENT",
	11: "ART",
	12: "LAW",
	13: "LANGUAGE",
	14: "GPE",
	15: "NORP",
	16: "ORDINAL",
	17: "CARDINAL",
	18: "DISEASE",
	19: "CONTACT",
	20: "ADAGE",
	21: "QUANTITY",
	22: "MISCELLANEOUS",
	23: "POSITION",
	24: "PROJECT"
	}

	# Print results with mapped entity types
	for result in ner_results:
	entity_group = result['entity_group']
	entity_description = label_mapping[int(entity_group.split('_')[-1])]
	print({
	'entity_group': entity_description,
	'score': result['score'],
	'word': result['word'],
	'start': result['start'],
	'end': result['end']
	})
	```

	## License

	This model licensed under the CC BY-NC-ND 4.0 license.
	What does this license allow?

	Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
	Non-Commercial: You may not use the material for commercial purposes.
	No Derivatives: If you remix, transform, or build upon the material, you may not distribute the modified material.

	For more information, please refer to the <a target="_blank" href="https://creativecommons.org/licenses/by-nc-nd/4.0/">CC BY-NC-ND 4.0 license</a>.


	## Contact

	For more information, questions, or issues, please contact LocalDoc at [[email protected]].