File size: 6,548 Bytes
25759de
 
 
 
 
 
 
da85b24
 
 
25759de
 
 
f11c39d
702f7cb
da85b24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134df93
 
 
 
5e9fcd1
793dbbb
0b3e6ee
25759de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b8cfca
25759de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
datasets:
- pubmed
language:
- en
metrics:
- bleu
- exact_match
- sacrebleu
- rouge
tags:
- medical
- dialog
- arxiv:2209.12177
widget:
- text: >-
    The liver is normal in size and with normal parenchymal echogenicity with no
    sign of space-occupying lesion or bile ducts dilatation. GB is well
    distended with no stone or wall thickening. The spleen is normal in size and
    parenchymal echogenicity with no sign of space-occupying lesion. visualized
    parts of the pancreas and para-aortic area are unremarkable. Both kidneys
    are normal in size with normal cortical parenchymal echogenicity with no
    sign of the stone, stasis, or perinephric collection. ureters are not
    dilated. The urinary bladder is empty so evaluation of pelvic organs is not
    possible. no free fluid is seen in the abdominopelvic cavity.
  example_title: Sample 1
- text: >-
    Liver is normal in size, with normal echogenicity with no sign of space
    occupying lesion. GB is semi distended with two stones up to 8mm in size
    with a rim of pericholecystic fluid and positive murphy sign. CBD is normal.
    Spleen is moderately enlarged with normal parenchymal echo with no S.O.L.
    Pancreas cannot be evaluated due to severe gas shadow. RT. and LT. kidneys
    are normal in size with increase parenchymal echogenicity with no sign of
    stone, stasis or perinephric collection with two cortical cystsin the upper
    pole of left kidney. Urinary bladder is mildly distended. Moderate free
    fluid is seen in the abdominopelvic cavity at present time.
  example_title: Sample 2
inference:
  parameters:
    repetition_penalty: 1
    num_beams: 5
    early_stopping: true
    max_len: 350
license: mit
---
# ReportQL β€” Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique

*[Seyed Ali Reza Moezzi](https://scholar.google.com/citations?hl=en&user=JIZgcjAAAAAJ)*,
*[Abdolrahman Ghaedi]()*,
*[Mojdeh Rahmanian](https://scholar.google.com/citations?user=2ZtVfnUAAAAJ)*,
*[Seyedeh Zahra Mousavi](https://www.researchgate.net/scientific-contributions/Seyedeh-Zahra-Mousavi-2176375936)*,
*[Ashkan Sami](https://scholar.google.com/citations?user=zIh9AvIAAAAJ)*
<html>
<div><sub><sup>*Submitted: 16 November 2021*</sup></sub></div>
<div><sub><sup>*Revised: 20 June 2022*</sup></sub></div>
<sub><sup>*Accepted: 27 July 2022*</sup></sub>
</html>

[[paper](https://link.springer.com/article/10.1007/s10278-022-00692-x)] [[arXiv](https://arxiv.org/abs/2209.12177)] [[dataset](https://www.kaggle.com/datasets/sarme77/reportql)] [[project page](https://realsarm.github.io/ReportQL/)]

## Introduction

This repository is code release for **Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique**

<p align="center"> <img src='https://raw.githubusercontent.com/realsarm/ReportQL/main/assets/overview.png' align="center" height="320px"> </p>

Since radiology reports needed for clinical practice and research are written and stored in free-text narrations, extraction of relative information for further analysis is difficult. In these circumstances, natural language processing (NLP) techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. In recent years, deep learning (DL)-based models have been adapted for NLP experiments with promising results. Despite the significant potential of DL models based on artificial neural networks (ANN) and convolutional neural networks (CNN), the models face some limitations to implement in clinical practice. Transformers, another new DL architecture, have been increasingly applied to improve the process. Therefore, in this study, we propose a transformer-based fine-grained named entity recognition (NER) architecture for clinical information extraction. We collected 88 abdominopelvic sonography reports in free-text formats and annotated them based on our developed information schema. The text-to-text transfer transformer model (T5) and Scifive, a pre-trained domain-specific adaptation of the T5 model, were applied for fine-tuning to extract entities and relations and transform the input into a structured format. Our transformer-based model in this study outperformed previously applied approaches such as ANN and CNN models based on ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores of 0.816, 0.668, 0.528, and 0.743, respectively, while providing an interpretable structured report.

## Dataset

Our annotated [dataset](https://doi.org/10.5281/zenodo.7072374) used in the paper is hosted in this repository and in [Kaggle Datasets](https://www.kaggle.com/datasets/sarme77/reportql).

The data is structured as follows:

```
data/
β”œβ”€β”€ trialReport
β”‚   └── ReportQL
β”‚       β”œβ”€β”€ Schemas
β”‚       β”‚   └── organs
β”‚       β”‚       └── simpleSchema.json
β”‚       └── dataset
β”‚           β”œβ”€β”€ test.csv
β”‚           β”œβ”€β”€ train_orig.csv
β”‚           └── training.csv
```

The `train_orig.csv` is our original training set. You can find our synthetic dataset and test set in `training.csv` and `test.csv` file.

Information schema used for annotating reports can be found in `simpleSchema.json`

## Setup

Setting up for this project involves installing dependencies.

### Setting up environments and Installing dependencies

```bash
virtualenv .venv
source .venv/bin/activate
```

### Installing dependencies

To install all the dependencies, please run the following:

```bash
pip install -r requirements.txt
```

### Fine-tuning

To start fine-tuning language model, run:

```bash
python script/fit.py
```

### Testing

For getting test results on our test set, run:

```bash
python script/test.py
```

### Inference

We prepared [a jupyter notebook](notebooks/predict_reportql.ipynb) for Inference.

## Fine-tuned Model

Our fine-tuned ReportQL weights can be accessed on πŸ€— HuggingFace.

* ReportQL: [base](https://huggingface.co/sarme/ReportQL-base)

## License

Please see the [LICENSE](LICENSE) file for details.

## Citation

If you find our work useful in your research, please consider citing us:

```
@article{moezzi2022application,
  title={Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique},
  author={Moezzi, Seyed Ali Reza and Ghaedi, Abdolrahman and Rahmanian, Mojdeh and Mousavi, Seyedeh Zahra and Sami, Ashkan},
  journal={Journal of Digital Imaging},
  pages={1--11},
  year={2022},
  publisher={Springer}
}
```