--- license: mit datasets: - ljvmiranda921/tlunified-ner language: - tl metrics: - f1 tags: - gliner pipeline_tag: token-classification model-index: - name: tl_gliner_small results: - task: type: token-classification name: Named Entity Recognition dataset: type: tlunified-ner name: TLUnified-NER split: test revision: 3f7dab9d232414ec6204f8d6934b9a35f90a254f metrics: - type: f1 value: 0.8483 name: F1 --- # GLiNER (small) model finetuned on Tagalog data This model was finetuned using the [GLiNER v2.5 suite](https://github.com/urchade/GLiNER) of models. You can find and replicate the training pipeline on [Github](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0-gliner). ## Usage ```python from gliner import GLiNER # Initialize GLiNER with the base model model = GLiNER.from_pretrained("ljvmiranda921/tl_gliner_small") # Sample text for entity prediction # Reference: Leni Robredo’s speech at the 2022 UP College of Law recognition rites text = """" Nagsimula ako sa Public Attorney’s Office, kung saan araw-araw, mula Lunes hanggang Biyernes, nasa loob ako ng iba’t ibang court room at tambak ang kaso. Bawat Sabado, nasa BJMP ako para ihanda ang aking mga kliyente. Nahasa ako sa crim law at litigation. Pero kinalaunan, lumipat ako sa isang NGO, ‘yung Sentro ng Alternatibong Lingap Panligal. Sa SALIGAN talaga ako nahubog bilang abugado: imbes na tinatanggap na lang ang mga batas na kailangang sundin, nagtatanong din kung ito ba ay tunay na instrumento para makapagbigay ng katarungan sa ordinaryong Pilipino. Imbes na maghintay ng mga kliyente sa de-aircon na opisina, dinadayo namin ang mga malalayong komunidad. Kadalasan, naka-tsinelas, naka-t-shirt at maong, hinahanap namin ang mga komunidad, tinatawid ang mga bundok, palayan, at mga ilog para tumungo sa mga lugar kung saan hirap ang mga batayang sektor na makakuha ng access to justice. Naaalala ko pa noong naging lead lawyer ako para sa isang proyekto: sa loob ng mahigit dalawang taon, bumibiyahe ako buwan-buwan papunta sa malayong isla ng Masbate, nagpa-paralegal training sa mga batayang sektor doon, ipinapaliwanag, itinituturo, at sinasanay sila sa mga batas na nagbibigay-proteksyon sa mga karapatan nila. """ # Labels for entity prediction # Most GLiNER models should work best when entity types are in lower case or title case labels = ["person", "organization", "location"] # Perform entity prediction entities = model.predict_entities(text, labels, threshold=0.5) # Display predicted entities and their labels for entity in entities: print(entity["text"], "=>", entity["label"]) # Sample output: # Public Attorney’s Office => organization # BJMP => organization # Sentro ng Alternatibong Lingap Panligal => organization # Masbate => location ``` ## Citation Please cite the following papers when using these models: ``` @misc{zaratiana2023gliner, title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois}, year={2023}, eprint={2311.08526}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ``` @inproceedings{miranda-2023-calamancy, title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit", author = "Miranda, Lester James", booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)", month = dec, year = "2023", address = "Singapore, Singapore", publisher = "Empirical Methods in Natural Language Processing", url = "https://aclanthology.org/2023.nlposs-1.1", pages = "1--7", } ``` If you're using the NER dataset: ``` @inproceedings{miranda-2023-developing, title = "Developing a Named Entity Recognition Dataset for {T}agalog", author = "Miranda, Lester James", booktitle = "Proceedings of the First Workshop in South East Asian Language Processing", month = nov, year = "2023", address = "Nusa Dua, Bali, Indonesia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.sealp-1.2", doi = "10.18653/v1/2023.sealp-1.2", pages = "13--20", } ```