auhide
/

bert-base-ner-bulgarian

Token Classification

Inference Endpoints

Model card Files Files and versions Community

bert-base-ner-bulgarian / README.md

auhide's picture

Update README.md

96be52a verified 7 months ago

|

No virus

1.63 kB

	---
	license: cc-by-4.0
	datasets:
	- wikiann
	language:
	- bg
	model-index:
	- name: bert-base-ner-bulgarian
	results: []
	metrics:
	- f1
	pipeline_tag: text-classification
	widget:
	- text: 'Философът Барух Спиноза е роден в Амстердам.'
	---

	# 🇧🇬 BERT - Bulgarian Named Entity Recognition
	The model [rmihaylov/bert-base-bg](https://huggingface.co/rmihaylov/bert-base-bg) fine-tuned on a Bulgarian subset of [wikiann](https://huggingface.co/datasets/wikiann).
	It achieves 0.99 F1-score on that dataset.

	## Usage
	Import the libraries:
	```python
	from pprint import pprint

	import torch
	from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
	```

	Load the model:
	```python
	MODEL_ID = "auhide/bert-base-ner-bulgarian"
	model = AutoModelForTokenClassification.from_pretrained(MODEL_ID)
	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

	ner = pipeline(task="ner", model=model, tokenizer=tokenizer)
	```

	Do inference:
	```python
	text = "Философът Барух Спиноза е роден в Амстердам."
	pprint(ner(text))
	```

	```sh
	[{'end': 13,
	'entity': 'B-PER',
	'index': 3,
	'score': 0.9954899,
	'start': 9,
	'word': '▁Бар'},
	{'end': 15,
	'entity': 'I-PER',
	'index': 4,
	'score': 0.9660787,
	'start': 13,
	'word': 'ух'},
	{'end': 23,
	'entity': 'I-PER',
	'index': 5,
	'score': 0.99728084,
	'start': 15,
	'word': '▁Спиноза'},
	{'end': 43,
	'entity': 'B-LOC',
	'index': 9,
	'score': 0.8990479,
	'start': 33,
	'word': '▁Амстердам'}]
	```

	Note: There are three types of entities - `PER`, `ORG`, `LOC`.