--- inference: true license: cc-by-4.0 datasets: - wikiann language: - bg metrics: - f1 pipeline_tag: text-classification widget: - text: 'Философът Барух Спиноза е роден в Амстердам.' --- # 🇧🇬 BERT - Bulgarian Named Entity Recognition The model [rmihaylov/bert-base-bg](https://huggingface.co/rmihaylov/bert-base-bg) fine-tuned on a Bulgarian subset of [wikiann](https://huggingface.co/datasets/wikiann). It achieves *0.99* F1-score on that dataset. ## Usage Import the libraries: ```python from pprint import pprint import torch from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline ``` Load the model: ```python MODEL_ID = "auhide/bert-base-ner-bulgarian" model = AutoModelForTokenClassification.from_pretrained(MODEL_ID) tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) ner = pipeline(task="ner", model=model, tokenizer=tokenizer) ``` Do inference: ```python text = "Философът Барух Спиноза е роден в Амстердам." pprint(ner(text)) ``` ```sh [{'end': 13, 'entity': 'B-PER', 'index': 3, 'score': 0.9954899, 'start': 9, 'word': '▁Бар'}, {'end': 15, 'entity': 'I-PER', 'index': 4, 'score': 0.9660787, 'start': 13, 'word': 'ух'}, {'end': 23, 'entity': 'I-PER', 'index': 5, 'score': 0.99728084, 'start': 15, 'word': '▁Спиноза'}, {'end': 43, 'entity': 'B-LOC', 'index': 9, 'score': 0.8990479, 'start': 33, 'word': '▁Амстердам'}] ``` Note: There are three types of entities - `PER`, `ORG`, `LOC`.