File size: 2,675 Bytes
2b1de79
 
6ca75b3
 
 
 
 
 
2b1de79
 
6ca75b3
2b1de79
6ca75b3
2b1de79
 
 
6ca75b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
library_name: transformers
tags:
- medical
language:
- ru
base_model:
- Babelscape/wikineural-multilingual-ner
---

# Model Card

The model for NER recognition of medical requests

### Model Description

This model is finetuned on 4756 russian patient requests

**The NER entities are**:
- **B-SIM, I-SIM**: simptoms;
- **B-SUBW, I-SUBW**: subway;
- **GEN**: gender;
- **CHILD**: child mention;
- **B-SPEC, I-SPEC**: physician speciality;

It's based on the [Babelscape/wikineural-multilingual-ner](https://huggingface.co/Babelscape/wikineural-multilingual-ner) 177M mBERT model.

## Training info
Training parameters:
```
MAX_LEN = 256
TRAIN_BATCH_SIZE = 4
VALID_BATCH_SIZE = 2
EPOCHS = 5
LEARNING_RATE = 1e-05
MAX_GRAD_NORM = 10
```
The loss and accurancy on 5 EPOCH:
```
Training loss epoch: 0.004890048759878736
Training accuracy epoch: 0.9896078955134066
```
The validations results:
```
Validation Loss: 0.008194072216433625
Validation Accuracy: 0.9859073599112612
```
Detailed metrics (mostly f1-score):
```
              precision    recall  f1-score   support

          EN       1.00      0.98      0.99        84
        HILD       1.00      0.99      0.99       436
         SIM       0.96      0.96      0.96      5355
        SPEC       0.99      1.00      0.99       751
        SUBW       0.99      1.00      0.99       327

   micro avg       0.96      0.97      0.97      6953
   macro avg       0.99      0.98      0.99      6953
weighted avg       0.96      0.97      0.97      6953
```
## Results:
The model does not always identify words completely, but at the same time it detects individual pieces of words correctly even if the words are misspelled

For example, the query "У меня треога и норушения сна. Подскажи хорошего психотервта в районе метро Октбрьской." returns the result: 
```
B-SIM I-SIM I-SIM  B-SIM  I-SIM   I-SIM     B-SPEC I-SPEC I-SPEC I-SPEC  I-SPEC     B-SUBW  I-SUBW I-SUBW  I-SUBW
т      ре    ога    но     ру    шения сна    пс     их   о      тер     вта          ок       т     брь    ской
```
As you can see it correctly detects event misspelled word: треога, норушения, психотервта

## The simplest way to use the model with 🤗 transformers pipeline:
```
pipe = pipeline(task="ner", model='Mykes/med_bert_ner', tokenizer='Mykes/med_bert_ner', aggregation_strategy="simple")
query = "У меня треога и норушения сна. Подскажи хорошего психотервта в районе метро Октбрьской."
results = pipe(query.lower())
```