Mykes
/

med-bert-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

med-bert-ner / README.md

Mykes's picture

Update README.md

0e515b6 verified 9 days ago

|

No virus

3.27 kB

	---
	library_name: transformers
	tags:
	- medical
	language:
	- ru
	base_model:
	- Babelscape/wikineural-multilingual-ner
	datasets:
	- Mykes/patient_queries_ner
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63565a3d58acee56a457f799/2M8J-5WABcDZe1TwY4HXk.png)

	# Model Card

	The model for NER recognition of medical requests

	### Model Description

	This model is finetuned on 4756 russian patient queries [Mykes/patient_queries_ner](https://huggingface.co/datasets/Mykes/patient_queries_ner)

	The NER entities are:
	- B-SIM, I-SIM: simptoms;
	- B-SUBW, I-SUBW: subway;
	- GEN: gender;
	- CHILD: child mention;
	- B-SPEC, I-SPEC: physician speciality;

	It's based on the [Babelscape/wikineural-multilingual-ner](https://huggingface.co/Babelscape/wikineural-multilingual-ner) 177M mBERT model.

	## Training info
	Training parameters:
	```
	MAX_LEN = 256
	TRAIN_BATCH_SIZE = 4
	VALID_BATCH_SIZE = 2
	EPOCHS = 5
	LEARNING_RATE = 1e-05
	MAX_GRAD_NORM = 10
	```
	The loss and accurancy on 5 EPOCH:
	```
	Training loss epoch: 0.004890048759878736
	Training accuracy epoch: 0.9896078955134066
	```
	The validations results:
	```
	Validation Loss: 0.008194072216433625
	Validation Accuracy: 0.9859073599112612
	```
	Detailed metrics (mostly f1-score):
	```
	precision recall f1-score support

	EN 1.00 0.98 0.99 84
	HILD 1.00 0.99 0.99 436
	SIM 0.96 0.96 0.96 5355
	SPEC 0.99 1.00 0.99 751
	SUBW 0.99 1.00 0.99 327

	micro avg 0.96 0.97 0.97 6953
	macro avg 0.99 0.98 0.99 6953
	weighted avg 0.96 0.97 0.97 6953
	```
	## Results:
	The model does not always identify words completely, but at the same time it detects individual pieces of words correctly even if the words are misspelled

	For example, the query "У меня треога и норушения сна. Подскажи хорошего психотервта в районе метро Октбрьской." returns the result:
	```
	B-SIM I-SIM I-SIM B-SIM I-SIM I-SIM B-SPEC I-SPEC I-SPEC I-SPEC I-SPEC B-SUBW I-SUBW I-SUBW I-SUBW
	т ре ога но ру шения сна пс их о тер вта ок т брь ской
	```
	As you can see it correctly detects event misspelled word: треога, норушения, психотервта

	## The simplest way to use the model with 🤗 transformers pipeline:
	```
	pipe = pipeline(task="ner", model='Mykes/med_bert_ner', tokenizer='Mykes/med_bert_ner', aggregation_strategy="average")
	query = "У меня болит голова. Посоветуй невролога на проспекте мира"
	results = pipe(query.lower().strip('.,\n '))

	# The output:
	# [{'entity_group': 'SIM',
	# 'score': 0.9920678,
	# 'word': 'болит голова',
	# 'start': 7,
	# 'end': 19},
	# {'entity_group': 'SPEC',
	# 'score': 0.9985348,
	# 'word': 'невролога',
	# 'start': 31,
	# 'end': 40},
	# {'entity_group': 'SUBW',
	# 'score': 0.68749845,
	# 'word': 'проспекте мира',
	# 'start': 44,
	# 'end': 58}]
	```