tl_gliner_small / README.md

Update README.md

d2383ed verified 3 months ago

4.24 kB

	---
	license: mit
	datasets:
	- ljvmiranda921/tlunified-ner
	language:
	- tl
	metrics:
	- f1
	tags:
	- gliner
	pipeline_tag: token-classification
	model-index:
	- name: tl_gliner_small
	results:
	- task:
	type: token-classification
	name: Named Entity Recognition
	dataset:
	type: tlunified-ner
	name: TLUnified-NER
	split: test
	revision: 3f7dab9d232414ec6204f8d6934b9a35f90a254f
	metrics:
	- type: f1
	value: 0.8483
	name: F1
	---

	# GLiNER (small) model finetuned on Tagalog data

	This model was finetuned using the [GLiNER v2.5 suite](https://github.com/urchade/GLiNER) of models.
	You can find and replicate the training pipeline on [Github](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0-gliner).

	## Usage

	```python
	from gliner import GLiNER

	# Initialize GLiNER with the base model
	model = GLiNER.from_pretrained("ljvmiranda921/tl_gliner_small")

	# Sample text for entity prediction
	# Reference: Leni Robredo’s speech at the 2022 UP College of Law recognition rites
	text = """"
	Nagsimula ako sa Public Attorney’s Office, kung saan araw-araw, mula Lunes hanggang Biyernes, nasa loob ako ng iba’t ibang court room at tambak ang kaso.
	Bawat Sabado, nasa BJMP ako para ihanda ang aking mga kliyente. Nahasa ako sa crim law at litigation. Pero kinalaunan, lumipat ako sa isang NGO,
	‘yung Sentro ng Alternatibong Lingap Panligal. Sa SALIGAN talaga ako nahubog bilang abugado: imbes na tinatanggap na lang ang mga batas na kailangang
	sundin, nagtatanong din kung ito ba ay tunay na instrumento para makapagbigay ng katarungan sa ordinaryong Pilipino. Imbes na maghintay ng mga kliyente
	sa de-aircon na opisina, dinadayo namin ang mga malalayong komunidad. Kadalasan, naka-tsinelas, naka-t-shirt at maong, hinahanap namin ang mga komunidad,
	tinatawid ang mga bundok, palayan, at mga ilog para tumungo sa mga lugar kung saan hirap ang mga batayang sektor na makakuha ng access to justice.
	Naaalala ko pa noong naging lead lawyer ako para sa isang proyekto: sa loob ng mahigit dalawang taon, bumibiyahe ako buwan-buwan papunta sa malayong
	isla ng Masbate, nagpa-paralegal training sa mga batayang sektor doon, ipinapaliwanag, itinituturo, at sinasanay sila sa mga batas na nagbibigay-proteksyon
	sa mga karapatan nila.
	"""

	# Labels for entity prediction
	# Most GLiNER models should work best when entity types are in lower case or title case
	labels = ["person", "organization", "location"]

	# Perform entity prediction
	entities = model.predict_entities(text, labels, threshold=0.5)

	# Display predicted entities and their labels
	for entity in entities:
	print(entity["text"], "=>", entity["label"])

	# Sample output:
	# Public Attorney’s Office => organization
	# BJMP => organization
	# Sentro ng Alternatibong Lingap Panligal => organization
	# Masbate => location

	```


	## Citation

	Please cite the following papers when using these models:

	```
	@misc{zaratiana2023gliner,
	title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
	author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
	year={2023},
	eprint={2311.08526},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	```
	@inproceedings{miranda-2023-calamancy,
	title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
	author = "Miranda, Lester James",
	booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
	month = dec,
	year = "2023",
	address = "Singapore, Singapore",
	publisher = "Empirical Methods in Natural Language Processing",
	url = "https://aclanthology.org/2023.nlposs-1.1",
	pages = "1--7",
	}
	```

	If you're using the NER dataset:

	```
	@inproceedings{miranda-2023-developing,
	title = "Developing a Named Entity Recognition Dataset for {T}agalog",
	author = "Miranda, Lester James",
	booktitle = "Proceedings of the First Workshop in South East Asian Language Processing",
	month = nov,
	year = "2023",
	address = "Nusa Dua, Bali, Indonesia",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.sealp-1.2",
	doi = "10.18653/v1/2023.sealp-1.2",
	pages = "13--20",
	}
	```