File size: 2,885 Bytes

---

tags:
- token-classification
datasets:
- djagatiya/ner-ontonotes-v5-eng-v4
widget:
- text: "On September 1st George won 1 dollar while watching Game of Thrones."

---

# (NER) roberta-base : conll2012_ontonotesv5-english-v4

This `roberta-base` NER model was finetuned on `conll2012_ontonotesv5` version `english-v4` dataset. <br>
Check out [NER-System Repository](https://github.com/djagatiya/NER-System) for more information.

## Dataset
- conll2012_ontonotesv5
    - Language : English
    - Version : v4

  | Dataset | Examples |
  | --- | --- | 
  | Training | 75187 | 
  | Testing | 9479 |

## Evaluation

- Precision: 88.88
- Recall: 90.69
- F1-Score: 89.78

> check out this [eval.log](eval.log) file for evaluation metrics and classification report.

```
                precision    recall  f1-score   support

    CARDINAL       0.84      0.85      0.85       935
        DATE       0.85      0.90      0.87      1602
       EVENT       0.67      0.76      0.71        63
         FAC       0.74      0.72      0.73       135
         GPE       0.97      0.96      0.96      2240
    LANGUAGE       0.83      0.68      0.75        22
         LAW       0.66      0.62      0.64        40
         LOC       0.74      0.80      0.77       179
       MONEY       0.85      0.89      0.87       314
        NORP       0.93      0.96      0.95       841
     ORDINAL       0.81      0.89      0.85       195
         ORG       0.90      0.91      0.91      1795
     PERCENT       0.90      0.92      0.91       349
      PERSON       0.95      0.95      0.95      1988
     PRODUCT       0.74      0.83      0.78        76
    QUANTITY       0.76      0.80      0.78       105
        TIME       0.62      0.67      0.65       212
 WORK_OF_ART       0.58      0.69      0.63       166

   micro avg       0.89      0.91      0.90     11257
   macro avg       0.80      0.82      0.81     11257
weighted avg       0.89      0.91      0.90     11257
```

## Usage

```
from transformers import pipeline

ner_pipeline = pipeline(
    'token-classification', 
    model=r'djagatiya/ner-roberta-base-ontonotesv5-englishv4',
    aggregation_strategy='simple'
)
```
TEST 1
```
ner_pipeline("India is a beautiful country")
```

```
# Output
[{'entity_group': 'GPE',
  'score': 0.99186057,
  'word': ' India',
  'start': 0,
  'end': 5}]
```

TEST 2

```
ner_pipeline("On September 1st George won 1 dollar while watching Game of Thrones.")
```

```
# Output
[{'entity_group': 'DATE',
  'score': 0.99720246,
  'word': ' September 1st',
  'start': 3,
  'end': 16},
 {'entity_group': 'PERSON',
  'score': 0.99071586,
  'word': ' George',
  'start': 17,
  'end': 23},
 {'entity_group': 'MONEY',
  'score': 0.9872978,
  'word': ' 1 dollar',
  'start': 28,
  'end': 36},
 {'entity_group': 'WORK_OF_ART',
  'score': 0.9946732,
  'word': ' Game of Thrones',
  'start': 52,
  'end': 67}]
```