Edit model card

FrenchMedMCQA : Multiple-choice question answering on pharmacology exams using BioBERT V1.1, Wikipedia external knowledge and BM25 retriever

People Involved

Affiliations

  1. LIA, NLP team, Avignon University, Avignon, France.
  2. LS2N, TALN team, Nantes University, Nantes, France.
  3. CHU Nantes, Nantes University, Nantes, France.

Demo: How to use in HuggingFace Transformers

Requires Transformers: pip install transformers

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

path_model = "qanastek/FrenchMedMCQA-BioBERT-V1.1-Wikipedia-BM25"

tokenizer = AutoTokenizer.from_pretrained(path_model)
model = AutoModelForSequenceClassification.from_pretrained(path_model)

pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False, device=0) # GPU

dataset  = load_dataset("qanastek/FrenchMedMCQA")["test"]

for e in dataset:
    prediction = pipeline(e["bert_text"], truncation=True, max_length=model.config.max_position_embeddings)

Output:

Preview Output

Training data

The questions and their associated candidate answer(s) were collected from real French pharmacy exams on the remede website. Questions and answers were manually created by medical experts and used during examinations. The dataset is composed of 2,025 questions with multiple answers and 1,080 with a single one, for a total of 3,105 questions. Each instance of the dataset contains an identifier, a question, five options (labeled from A to E) and correct answer(s). The average question length is 14.17 tokens and the average answer length is 6.44 tokens. The vocabulary size is of 13k words, of which 3.8k are estimated medical domain-specific words (i.e. a word related to the medical field). We find an average of 2.49 medical domain-specific words in each question (17 % of the words) and 2 in each answer (36 % of the words). On average, a medical domain-specific word is present in 2 questions and in 8 answers.

# Answers Training Validation Test Total
1 595 164 321 1,080
2 528 45 97 670
3 718 71 141 930
4 296 30 56 382
5 34 2 7 43
Total 2171 312 622 3,105

Evaluation results

The test corpora used for this evaluation is available on Github.

Architecture Hamming EMR Hamming EMR Hamming EMR Hamming EMR Hamming EMR
BioBERT V1.1 36.19 15.43 38.72 16.72 33.33 14.14 35.13 16.23 34.27 13.98
PubMedBERT 33.98 14.14 34.00 13.98 35.66 15.59 33.87 14.79 35.44 14.79
CamemBERT-base 36.24 16.55 34.19 14.46 34.78 15.43 34.66 14.79 34.61 14.95
XLM-RoBERTa-base 37.92 17.20 31.26 11.89 35.84 16.07 32.47 14.63 33.00 14.95
BART-base 31.93 15.91 34.98 18.64 33.80 17.68 29.65 12.86 34.65 18.32

BibTeX Citations

Please cite the following paper when using this model.

FrenchMedMCQA corpus and linked tools:

@unpublished{labrak:hal-03824241,
  TITLE = {{FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain}},
  AUTHOR = {Labrak, Yanis and Bazoge, Adrien and Dufour, Richard and Daille, B{\'e}atrice and Gourraud, Pierre-Antoine and Morin, Emmanuel and Rouvier, Mickael},
  URL = {https://hal.archives-ouvertes.fr/hal-03824241},
  NOTE = {working paper or preprint},
  YEAR = {2022},
  MONTH = Oct,
  PDF = {https://hal.archives-ouvertes.fr/hal-03824241/file/LOUHI_2022___QA-3.pdf},
  HAL_ID = {hal-03824241},
  HAL_VERSION = {v1},
}

HuggingFace's Transformers :

@misc{https://doi.org/10.48550/arxiv.1910.03771,
    doi = {10.48550/ARXIV.1910.03771},
    url = {https://arxiv.org/abs/1910.03771},
    author = {Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Rémi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Scao, Teven Le and Gugger, Sylvain and Drame, Mariama and Lhoest, Quentin and Rush, Alexander M.},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {HuggingFace's Transformers: State-of-the-art Natural Language Processing},
    publisher = {arXiv},
    year = {2019}, 
    copyright = {arXiv.org perpetual, non-exclusive license}
}

Acknowledgment

This work was financially supported by Zenidoc, the DIETS project financed by the Agence Nationale de la Recherche (ANR) under contract ANR-20-CE23-0005 and the ANR AIBy4 (ANR-20-THIA-0011).

Downloads last month
34
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results