|
--- |
|
license: apache-2.0 |
|
language: |
|
- de |
|
tags: |
|
- entity-linking |
|
- wikidata |
|
- umls |
|
--- |
|
|
|
SapBERT-DE is a model for German biomedical entity linking which is obtained by fine-tuning multilingual entity linking model [`cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR`](https://huggingface.co/cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR) using a German biomedical entity linking knowledge base named [UMLS-Wikidata](https://zenodo.org/records/11003203). |
|
|
|
|
|
# Usage |
|
```python |
|
import numpy as np |
|
from tqdm import tqdm |
|
import torch |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("permediq/SapBERT-DE", use_fast=True) |
|
model = AutoModel.from_pretrained("permediq/SapBERT-DE").cuda() |
|
|
|
# entity descriptions to embed |
|
entity_descriptions = ["Cerebellum", "Zerebellum", "Kleinhirn", "Anaesthesie"] |
|
|
|
bs = 32 # batch size |
|
all_embs = [] |
|
for i in tqdm(np.arange(0, len(entity_descriptions), bs)): |
|
toks = tokenizer.batch_encode_plus(entity_descriptions[i:i+bs], |
|
padding="max_length", |
|
max_length=40, # model trained with 40 max_length |
|
truncation=True, |
|
return_tensors="pt") |
|
toks_cuda = {} |
|
for k,v in toks.items(): |
|
toks_cuda[k] = v.cuda() |
|
cls_rep = model(**toks_cuda)[0][:,0,:] |
|
all_embs.append(cls_rep.cpu().detach()) |
|
|
|
all_embs = torch.cat(all_embs) |
|
|
|
def cos_sim(a, b): |
|
a_norm = torch.nn.functional.normalize(a, p=2, dim=1) |
|
b_norm = torch.nn.functional.normalize(b, p=2, dim=1) |
|
return torch.mm(a_norm, b_norm.transpose(0, 1)) |
|
|
|
# cosine similarity of first entity with all the entities |
|
print(cos_sim(all_embs[0].unsqueeze(0), all_embs)) |
|
|
|
# >>> tensor([[1.0000, 0.9337, 0.6206, 0.2086]]) |
|
``` |
|
|
|
# BibTeX |
|
|
|
```bash |
|
@inproceedings{mustafa-etal-2024-leveraging, |
|
title = "Leveraging {W}ikidata for Biomedical Entity Linking in a Low-Resource Setting: A Case Study for {G}erman", |
|
author = "Mustafa, Faizan E and |
|
Dima, Corina and |
|
Ochoa, Juan and |
|
Staab, Steffen", |
|
booktitle = "Proceedings of the 6th Clinical Natural Language Processing Workshop", |
|
month = jun, |
|
year = "2024", |
|
address = "Mexico City, Mexico", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2024.clinicalnlp-1.17", |
|
pages = "202--207", |
|
``` |
|
|