|
--- |
|
license: cc-by-4.0 |
|
datasets: |
|
- wikiann |
|
language: |
|
- bg |
|
model-index: |
|
- name: bert-base-ner-bulgarian |
|
results: [] |
|
metrics: |
|
- f1 |
|
pipeline_tag: text-classification |
|
widget: |
|
- text: 'Философът Барух Спиноза е роден в Амстердам.' |
|
--- |
|
|
|
# 🇧🇬 BERT - Bulgarian Named Entity Recognition |
|
The model [rmihaylov/bert-base-bg](https://huggingface.co/rmihaylov/bert-base-bg) fine-tuned on a Bulgarian subset of [wikiann](https://huggingface.co/datasets/wikiann). |
|
It achieves *0.99* F1-score on that dataset. |
|
|
|
## Usage |
|
Import the libraries: |
|
```python |
|
from pprint import pprint |
|
|
|
import torch |
|
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline |
|
``` |
|
|
|
Load the model: |
|
```python |
|
MODEL_ID = "auhide/bert-base-ner-bulgarian" |
|
model = AutoModelForTokenClassification.from_pretrained(MODEL_ID) |
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
|
|
|
ner = pipeline(task="ner", model=model, tokenizer=tokenizer) |
|
``` |
|
|
|
Do inference: |
|
```python |
|
text = "Философът Барух Спиноза е роден в Амстердам." |
|
pprint(ner(text)) |
|
``` |
|
|
|
```sh |
|
[{'end': 13, |
|
'entity': 'B-PER', |
|
'index': 3, |
|
'score': 0.9954899, |
|
'start': 9, |
|
'word': '▁Бар'}, |
|
{'end': 15, |
|
'entity': 'I-PER', |
|
'index': 4, |
|
'score': 0.9660787, |
|
'start': 13, |
|
'word': 'ух'}, |
|
{'end': 23, |
|
'entity': 'I-PER', |
|
'index': 5, |
|
'score': 0.99728084, |
|
'start': 15, |
|
'word': '▁Спиноза'}, |
|
{'end': 43, |
|
'entity': 'B-LOC', |
|
'index': 9, |
|
'score': 0.8990479, |
|
'start': 33, |
|
'word': '▁Амстердам'}] |
|
``` |
|
|
|
Note: There are three types of entities - `PER`, `ORG`, `LOC`. |