metadata
tags:
- token-classification
datasets:
- djagatiya/ner-ontonotes-v5-eng-v4
widget:
- text: On September 1st George won 1 dollar while watching Game of Thrones.
(NER) roberta-base : conll2012_ontonotesv5-english-v4
This roberta-base
NER model was finetuned on conll2012_ontonotesv5
version english-v4
dataset.
Check out NER-System Repository for more information.
Dataset
conll2012_ontonotesv5
- Language : English
- Version : v4
Dataset Examples Training 75187 Testing 9479
Evaluation
- Precision: 88.88
- Recall: 90.69
- F1-Score: 89.78
check out this eval.log file for evaluation metrics and classification report.
precision recall f1-score support
CARDINAL 0.84 0.85 0.85 935
DATE 0.85 0.90 0.87 1602
EVENT 0.67 0.76 0.71 63
FAC 0.74 0.72 0.73 135
GPE 0.97 0.96 0.96 2240
LANGUAGE 0.83 0.68 0.75 22
LAW 0.66 0.62 0.64 40
LOC 0.74 0.80 0.77 179
MONEY 0.85 0.89 0.87 314
NORP 0.93 0.96 0.95 841
ORDINAL 0.81 0.89 0.85 195
ORG 0.90 0.91 0.91 1795
PERCENT 0.90 0.92 0.91 349
PERSON 0.95 0.95 0.95 1988
PRODUCT 0.74 0.83 0.78 76
QUANTITY 0.76 0.80 0.78 105
TIME 0.62 0.67 0.65 212
WORK_OF_ART 0.58 0.69 0.63 166
micro avg 0.89 0.91 0.90 11257
macro avg 0.80 0.82 0.81 11257
weighted avg 0.89 0.91 0.90 11257
Usage
from transformers import pipeline
ner_pipeline = pipeline(
'token-classification',
model=r'djagatiya/ner-roberta-base-ontonotesv5-englishv4',
aggregation_strategy='simple'
)
TEST 1
ner_pipeline("India is a beautiful country")
# Output
[{'entity_group': 'GPE',
'score': 0.99186057,
'word': ' India',
'start': 0,
'end': 5}]
TEST 2
ner_pipeline("On September 1st George won 1 dollar while watching Game of Thrones.")
# Output
[{'entity_group': 'DATE',
'score': 0.99720246,
'word': ' September 1st',
'start': 3,
'end': 16},
{'entity_group': 'PERSON',
'score': 0.99071586,
'word': ' George',
'start': 17,
'end': 23},
{'entity_group': 'MONEY',
'score': 0.9872978,
'word': ' 1 dollar',
'start': 28,
'end': 36},
{'entity_group': 'WORK_OF_ART',
'score': 0.9946732,
'word': ' Game of Thrones',
'start': 52,
'end': 67}]