|
--- |
|
license: cc-by-nc-4.0 |
|
language: "en" |
|
tags: |
|
- longformer |
|
- clinical |
|
- biomedical |
|
--- |
|
|
|
<span style="font-size:larger;">**KEPTlongfomer**</span> is a medical knowledge enhanced version of Longformer that was further pre-trained using [contrastive learning](https://arxiv.org/pdf/2210.03304.pdf). |
|
|
|
### Pre-training |
|
We initialized this model from RoBERTa-base-PM-M3-Voc-distill from Facebook [bio-lm](https://github.com/facebookresearch/bio-lm/). |
|
|
|
And then pretrained with Hierarchical Self-Alignment Pretrain (HSAP) using Knowledge Graph UMLS. |
|
This includes (a) Hierarchy, (b) Synonym, (c) Abbreviation. For more info, see section 3.3 in [paper](https://arxiv.org/pdf/2210.03304.pdf). |
|
The learning rate was 5e-5, weight decay was 0.01, adam epsilon was 1e-5. |
|
|
|
### Usage |
|
|
|
Try the following sentence with Fill-Mask task on the right. The sentence masks token "cardiac". |
|
``` |
|
74F with HTN, HLD, DM2, newly diagnosed atrial fibrillation in October who was transferred to hospital for <mask> catheterization after presentation there with syncopal episode. |
|
``` |
|
|
|
Or load the model directly from Transformers: |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
tokenizer = AutoTokenizer.from_pretrained("whaleloops/KEPTlongformer-PMM3") |
|
config = AutoConfig.from_pretrained("whaleloops/KEPTlongformer-PMM3") |
|
model = AutoModelForMaskedLM.from_pretrained("whaleloops/KEPTlongformer-PMM3", config=config) |
|
``` |
|
|
|
See our [github](https://github.com/whaleloops/KEPT/tree/rerank300) for how to use this with prompts on auto ICD coding. |
|
|
|
With the following result: |
|
| Metric | Score | |
|
| ------------- | ------------- | |
|
|rec_micro| =0.5844294992252652| |
|
|rec_macro| =0.12471916602840005| |
|
|rec_at_8| =0.4138093882408751| |
|
|rec_at_75| =0.8581874197033126| |
|
|rec_at_50| =0.8109877644497351| |
|
|rec_at_5| =0.2923155353947738| |
|
|rec_at_15| =0.586890060777621| |
|
|prec_micro| =0.6537291416981642| |
|
|prec_macro| =0.1382069689951297| |
|
|prec_at_8| =0.7835112692763938| |
|
|prec_at_75| =0.20033214709371291| |
|
|prec_at_50| =0.2810260972716489| |
|
|prec_at_5| =0.8551008303677343| |
|
|prec_at_15| =0.6288256227758008| |
|
|f1_micro| =0.6171399726721254| |
|
|f1_macro| =0.13111711325953157| |
|
|f1_at_8| =0.54158310388029| |
|
|f1_at_75| =0.324835806140454| |
|
|f1_at_50| =0.4174099512237087| |
|
|f1_at_5| =0.4356905906241822| |
|
|f1_at_15| =0.6071345676658747| |
|
|auc_micro| =0.9653561390964384| |
|
|auc_macro| =0.8572490224880879| |
|
|acc_micro| =0.4462779749767132| |
|
|acc_macro| =0.09732882850157536| |
|
|
|
|
|
|
|
|
|
|