|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- ko |
|
pipeline_tag: token-classification |
|
library_name: gliner |
|
--- |
|
|
|
|
|
# Model Card for GLiNER-ko |
|
|
|
GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios. |
|
|
|
This version has been trained on the **various Korean NER** dataset (Research purpose). Commercially permission versions are available (**urchade/gliner_smallv2**, **urchade/gliner_mediumv2**, **urchade/gliner_largev2**) |
|
|
|
## Links |
|
|
|
* Paper: https://arxiv.org/abs/2311.08526 |
|
* Repository: https://github.com/urchade/GLiNER |
|
|
|
## Installation |
|
To use this model, you must install the Korean fork of GLiNER Python library and mecab-ko: |
|
``` |
|
!pip install gliner |
|
!pip install python-mecab-ko |
|
``` |
|
|
|
## Usage |
|
Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using `GLiNER.from_pretrained` and predict entities with `predict_entities`. |
|
|
|
```python |
|
from gliner import GLiNER |
|
|
|
model = GLiNER.from_pretrained("taeminlee/gliner_ko") |
|
|
|
text = """ |
|
νΌν° μμ¨ κ²½(, 1961λ
10μ 31μΌ ~ )μ λ΄μ§λλμ μν κ°λ
, κ°λ³Έκ°, μν νλ‘λμμ΄λ€. J. R. R. ν¨ν¨μ μμ€μ μμμΌλ‘ ν γλ°μ§μ μ μ μν 3λΆμγ(2001λ
~2003λ
)μ κ°λ
μΌλ‘ κ°μ₯ μ λͺ
νλ€. 2005λ
μλ 1933λ
μ νΉμ½©μ 리λ©μ΄ν¬μ γνΉμ½©(2005)γμ κ°λ
μ 맑μλ€. |
|
""" |
|
|
|
tta_labels = ["ARTIFACTS", "ANIMAL", "CIVILIZATION", "DATE", "EVENT", "STUDY_FIELD", "LOCATION", "MATERIAL", "ORGANIZATION", "PERSON", "PLANT", "QUANTITY", "TIME", "TERM", "THEORY"] |
|
|
|
entities = model.predict_entities(text, labels) |
|
|
|
for entity in entities: |
|
print(entity["text"], "=>", entity["label"]) |
|
``` |
|
|
|
``` |
|
νΌν° μμ¨ κ²½ => PERSON |
|
1961λ
10μ 31μΌ ~ => DATE |
|
λ΄μ§λλ => LOCATION |
|
μν κ°λ
=> CIVILIZATION |
|
κ°λ³Έκ° => CIVILIZATION |
|
μν => CIVILIZATION |
|
νλ‘λμ => CIVILIZATION |
|
J. R. R. ν¨ν¨ => PERSON |
|
3λΆμ => QUANTITY |
|
2001λ
~2003λ
=> DATE |
|
κ°λ
=> CIVILIZATION |
|
2005λ
=> DATE |
|
1933λ
μ => DATE |
|
νΉμ½© => ARTIFACTS |
|
νΉμ½© => ARTIFACTS |
|
2005 => DATE |
|
κ°λ
=> CIVILIZATION |
|
``` |
|
|
|
|
|
## Named Entity Recognition benchmark result |
|
|
|
Evaluate with the [konne dev set](https://github.com/korean-named-entity/konne) |
|
|
|
| Model | Precision (P) | Recall (R) | F1 | |
|
|------------------|-----------|-----------|--------| |
|
| Gliner-ko (t=0.5) | **72.51%** | **79.82%** | **75.99%** | |
|
| Gliner Large-v2 (t=0.5) | 34.33% | 19.50% | 24.87% | |
|
| Gliner Multi (t=0.5) | 40.94% | 34.18% | 37.26% | |
|
| Pororo | 70.25% | 57.94% | 63.50% | |
|
|
|
## Model Authors |
|
The model authors are: |
|
* [Taemin Lee](http://tmkor.com) |
|
* [Urchade Zaratiana](https://huggingface.co/urchade) |
|
* Nadi Tomeh |
|
* Pierre Holat |
|
* Thierry Charnois |
|
|
|
## Citation |
|
```bibtex |
|
@misc{zaratiana2023gliner, |
|
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, |
|
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois}, |
|
year={2023}, |
|
eprint={2311.08526}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |