File size: 5,661 Bytes
6449426 64a9f07 f65402c 24e0a0a 6449426 58a7260 6449426 7af1b01 6449426 64a9f07 6449426 cd7e943 72e6c79 cd7e943 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
tags:
- spacy
- token-classification
language:
- tl
license: mit
library_name: spacy
pipeline_tag: token-classification
model-index:
- name: Medium-sized calamanCy pipeline by L.J. Miranda
results:
- task:
type: token-classification
name: Named Entity Recognition
dataset:
type: tlunified-ner
name: TLUnified-NER
split: test
revision: 3f7dab9d232414ec6204f8d6934b9a35f90a254f
metrics:
- type: f1
value: 0.889
name: F1
datasets:
- ljvmiranda921/tlunified-ner
---
calamanCy: Tagalog NLP pipelines in spaCy
Paper: arxiv.org/abs/2311.07171
| Feature | Description |
| --- | --- |
| **Name** | `tl_calamancy_lg` |
| **Version** | `0.1.0` |
| **spaCy** | `>=3.5.0,<4.0.0` |
| **Default Pipeline** | `tok2vec`, `tagger`, `morphologizer`, `parser`, `ner` |
| **Components** | `tok2vec`, `tagger`, `morphologizer`, `parser`, `ner` |
| **Vectors** | 714435 keys, 714435 unique vectors (300 dimensions) |
| **Sources** | [TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br />[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br />[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
| **License** | `MIT` |
| **Author** | [Lester James V. Miranda](https://github.com/ljvmiranda921/calamanCy) |
### Label Scheme
<details>
<summary>View label scheme (120 labels for 4 components)</summary>
| Component | Labels |
| --- | --- |
| **`tagger`** | `ADJ`, `ADJ_PART`, `ADP`, `ADV`, `ADV_PART`, `AUX`, `CCONJ`, `DET`, `DET_ADP`, `DET_PART`, `INTJ`, `NOUN`, `NOUN_PART`, `NUM`, `NUM_PART`, `PART`, `PRON`, `PRON_PART`, `PROPN`, `PUNCT`, `SCONJ`, `VERB`, `VERB_PART` |
| **`morphologizer`** | `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Act`, `Case=Nom\|POS=ADP`, `POS=NOUN`, `POS=PUNCT`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Pass`, `Case=Gen\|POS=ADP`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Aspect=Imp\|Mood=Ind\|POS=VERB\|Voice=Act`, `POS=ADV\|PronType=Dem`, `Foreign=Yes\|POS=NOUN`, `Degree=Pos\|POS=ADJ`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Deixis=Med\|Number=Sing\|POS=PRON\|PronType=Dem`, `Gender=Masc\|POS=PROPN`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Link=Yes\|POS=ADJ`, `POS=ADP`, `Case=Dat\|POS=ADP`, `POS=VERB\|Polarity=Pos`, `Aspect=Hab\|POS=VERB`, `POS=SCONJ`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Aspect=Prosp\|Mood=Ind\|POS=VERB\|Voice=Act`, `POS=ADV`, `POS=PART\|Polarity=Neg`, `Aspect=Imp\|Mood=Ind\|POS=VERB\|Voice=Pass`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Lfoc`, `POS=PROPN`, `Case=Nom\|Deixis=Prox\|Number=Sing\|POS=PRON\|PronType=Dem`, `Gender=Masc\|POS=NOUN`, `Gender=Fem\|POS=NOUN`, `Degree=Pos\|Gender=Fem\|POS=ADJ`, `Gender=Fem\|POS=PROPN`, `Case=Nom\|Clusivity=In\|Number=Dual\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=DET\|PronType=Ind`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=PRON\|PronType=Prs\|Reflex=Yes`, `Gender=Masc\|POS=DET\|PronType=Emp`, `Case=Nom\|POS=PRON\|PronType=Int`, `Link=Yes\|POS=NOUN`, `POS=PART\|PartType=Int`, `POS=INTJ\|Polarity=Pos`, `Link=Yes\|POS=PART\|PartType=Int`, `POS=VERB\|Polarity=Neg`, `Degree=Pos\|POS=ADJ\|PronType=Int`, `Case=Gen\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|PronType=Int\|Voice=Act`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|PronType=Int\|Voice=Pass`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Ifoc`, `POS=ADV\|PronType=Int`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Act`, `POS=PART\|PartType=Nfh`, `Deixis=Remt\|POS=ADV\|PronType=Dem`, `Aspect=Imp\|Mood=Pot\|POS=VERB\|Voice=Act`, `Link=Yes\|POS=VERB\|Polarity=Pos`, `Link=Yes\|POS=VERB\|Polarity=Neg`, `POS=PART\|PartType=Des`, `Mood=Imp\|POS=AUX\|Polarity=Neg`, `Case=Nom\|Link=Yes\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Nom\|Link=Yes\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Pass`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Lfoc`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Bfoc`, `POS=DET\|PronType=Tot`, `Case=Dat\|Link=Yes\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Link=Yes\|POS=PRON\|PronType=Prs\|Reflex=Yes`, `Mood=Imp\|POS=VERB\|Voice=Act`, `Case=Dat\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Imp\|POS=VERB\|Voice=Lfoc`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Mood=Imp\|POS=VERB\|Voice=Pass`, `Case=Gen\|Clusivity=In\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Aspect=Hab\|POS=VERB\|Voice=Pass`, `Gender=Masc\|Link=Yes\|POS=PROPN`, `Case=Gen\|Link=Yes\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Link=Yes\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `POS=ADJ`, `POS=PART`, `POS=PRON`, `POS=VERB`, `POS=INTJ`, `POS=CCONJ`, `POS=NUM`, `POS=DET` |
| **`parser`** | `ROOT`, `advmod`, `case`, `dep`, `nmod`, `nsubj`, `obj`, `obl`, `punct` |
| **`ner`** | `LOC`, `ORG`, `PER` |
</details>
### Citation
```
@inproceedings{miranda-2023-calamancy,
title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
author = "Miranda, Lester James",
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Empirical Methods in Natural Language Processing",
url = "https://aclanthology.org/2023.nlposs-1.1",
pages = "1--7",
}
``` |