File size: 2,885 Bytes
1ef7f42
 
 
 
28af6bc
 
1ef7f42
 
 
 
 
d0ceb75
1ef7f42
9df00cf
 
1ef7f42
 
 
 
 
 
 
 
 
 
 
f3a3de4
 
 
 
 
 
9184f53
1ef7f42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a11969a
1ef7f42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---

tags:
- token-classification
datasets:
- djagatiya/ner-ontonotes-v5-eng-v4
widget:
- text: "On September 1st George won 1 dollar while watching Game of Thrones."

---

# (NER) roberta-base : conll2012_ontonotesv5-english-v4

This `roberta-base` NER model was finetuned on `conll2012_ontonotesv5` version `english-v4` dataset. <br>
Check out [NER-System Repository](https://github.com/djagatiya/NER-System) for more information.

## Dataset
- conll2012_ontonotesv5
    - Language : English
    - Version : v4

  | Dataset | Examples |
  | --- | --- | 
  | Training | 75187 | 
  | Testing | 9479 |

## Evaluation

- Precision: 88.88
- Recall: 90.69
- F1-Score: 89.78

> check out this [eval.log](eval.log) file for evaluation metrics and classification report.

```
                precision    recall  f1-score   support

    CARDINAL       0.84      0.85      0.85       935
        DATE       0.85      0.90      0.87      1602
       EVENT       0.67      0.76      0.71        63
         FAC       0.74      0.72      0.73       135
         GPE       0.97      0.96      0.96      2240
    LANGUAGE       0.83      0.68      0.75        22
         LAW       0.66      0.62      0.64        40
         LOC       0.74      0.80      0.77       179
       MONEY       0.85      0.89      0.87       314
        NORP       0.93      0.96      0.95       841
     ORDINAL       0.81      0.89      0.85       195
         ORG       0.90      0.91      0.91      1795
     PERCENT       0.90      0.92      0.91       349
      PERSON       0.95      0.95      0.95      1988
     PRODUCT       0.74      0.83      0.78        76
    QUANTITY       0.76      0.80      0.78       105
        TIME       0.62      0.67      0.65       212
 WORK_OF_ART       0.58      0.69      0.63       166

   micro avg       0.89      0.91      0.90     11257
   macro avg       0.80      0.82      0.81     11257
weighted avg       0.89      0.91      0.90     11257
```

## Usage

```
from transformers import pipeline

ner_pipeline = pipeline(
    'token-classification', 
    model=r'djagatiya/ner-roberta-base-ontonotesv5-englishv4',
    aggregation_strategy='simple'
)
```
TEST 1
```
ner_pipeline("India is a beautiful country")
```

```
# Output
[{'entity_group': 'GPE',
  'score': 0.99186057,
  'word': ' India',
  'start': 0,
  'end': 5}]
```

TEST 2

```
ner_pipeline("On September 1st George won 1 dollar while watching Game of Thrones.")
```

```
# Output
[{'entity_group': 'DATE',
  'score': 0.99720246,
  'word': ' September 1st',
  'start': 3,
  'end': 16},
 {'entity_group': 'PERSON',
  'score': 0.99071586,
  'word': ' George',
  'start': 17,
  'end': 23},
 {'entity_group': 'MONEY',
  'score': 0.9872978,
  'word': ' 1 dollar',
  'start': 28,
  'end': 36},
 {'entity_group': 'WORK_OF_ART',
  'score': 0.9946732,
  'word': ' Game of Thrones',
  'start': 52,
  'end': 67}]
```