sarme commited on
Commit
25759de
β€’
1 Parent(s): f92c8f2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - pubmed
4
+ language:
5
+ - en
6
+ metrics:
7
+ - bleu
8
+ tags:
9
+ - medical
10
+ - dialog
11
+ ---
12
+ # ReportQL β€” Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique
13
+
14
+ *[Seyed Ali Reza Moezzi](https://scholar.google.com/citations?hl=en&user=JIZgcjAAAAAJ)*,
15
+ *[Abdolrahman Ghaedi]()*,
16
+ *[Mojdeh Rahmanian](https://scholar.google.com/citations?user=2ZtVfnUAAAAJ)*,
17
+ *[Seyedeh Zahra Mousavi](https://www.researchgate.net/scientific-contributions/Seyedeh-Zahra-Mousavi-2176375936)*,
18
+ *[Ashkan Sami](https://scholar.google.com/citations?user=zIh9AvIAAAAJ)*
19
+ <html>
20
+ <div><sub><sup>*Submitted: 16 November 2021*</sup></sub></div>
21
+ <div><sub><sup>*Revised: 20 June 2022*</sup></sub></div>
22
+ <sub><sup>*Accepted: 27 July 2022*</sup></sub>
23
+ </html>
24
+
25
+ [[paper](https://link.springer.com/article/10.1007/s10278-022-00692-x)] [[arXiv](https://arxiv.org/abs/2209.12177)] [[dataset](https://www.kaggle.com/datasets/sarme77/reportql)] [[project page](https://realsarm.github.io/ReportQL/)]
26
+
27
+ ## Introduction
28
+
29
+ This repository is code release for **Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique**
30
+
31
+ <p align="center"> <img src='assets/overview.png' align="center" height="320px"> </p>
32
+
33
+ Since radiology reports needed for clinical practice and research are written and stored in free-text narrations, extraction of relative information for further analysis is difficult. In these circumstances, natural language processing (NLP) techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. In recent years, deep learning (DL)-based models have been adapted for NLP experiments with promising results. Despite the significant potential of DL models based on artificial neural networks (ANN) and convolutional neural networks (CNN), the models face some limitations to implement in clinical practice. Transformers, another new DL architecture, have been increasingly applied to improve the process. Therefore, in this study, we propose a transformer-based fine-grained named entity recognition (NER) architecture for clinical information extraction. We collected 88 abdominopelvic sonography reports in free-text formats and annotated them based on our developed information schema. The text-to-text transfer transformer model (T5) and Scifive, a pre-trained domain-specific adaptation of the T5 model, were applied for fine-tuning to extract entities and relations and transform the input into a structured format. Our transformer-based model in this study outperformed previously applied approaches such as ANN and CNN models based on ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores of 0.816, 0.668, 0.528, and 0.743, respectively, while providing an interpretable structured report.
34
+
35
+ ## Dataset
36
+
37
+ Our annotated [dataset](https://doi.org/10.5281/zenodo.7072374) used in the paper is hosted in this repository and in [Kaggle Datasets](https://www.kaggle.com/datasets/sarme77/reportql).
38
+
39
+ The data is structured as follows:
40
+
41
+ ```
42
+ data/
43
+ β”œβ”€β”€ trialReport
44
+ β”‚ └── ReportQL
45
+ β”‚ β”œβ”€β”€ Schemas
46
+ β”‚ β”‚ └── organs
47
+ β”‚ β”‚ └── simpleSchema.json
48
+ β”‚ └── dataset
49
+ β”‚ β”œβ”€β”€ test.csv
50
+ β”‚ β”œβ”€β”€ train_orig.csv
51
+ β”‚ └── training.csv
52
+ ```
53
+
54
+ The `train_orig.csv` is our original training set. You can find our synthetic dataset and test set in `training.csv` and `test.csv` file.
55
+
56
+ Information schema used for annotating reports can be found in `simpleSchema.json`
57
+
58
+ ## Setup
59
+
60
+ Setting up for this project involves installing dependencies.
61
+
62
+ ### Setting up environments and Installing dependencies
63
+
64
+ ```bash
65
+ virtualenv .venv
66
+ source .venv/bin/activate
67
+ ```
68
+
69
+ ### Installing dependencies
70
+
71
+ To install all the dependencies, please run the following:
72
+
73
+ ```bash
74
+ pip install -r requirements.txt
75
+ ```
76
+
77
+ ### Fine-tuning
78
+
79
+ To start fine-tuning language model, run:
80
+
81
+ ```bash
82
+ python script/fit.py
83
+ ```
84
+
85
+ ### Testing
86
+
87
+ For getting test results on our test set, run:
88
+
89
+ ```bash
90
+ python script/test.py
91
+ ```
92
+
93
+ ### Inference
94
+
95
+ We prepared [a jupyter notebook](notebooks/predict_reportql.ipynb) for Inference.
96
+
97
+ ## Fine-tuned Model
98
+
99
+ Our fine-tuned ReportQL weights can be accessed on πŸ€— HuggingFace.
100
+
101
+ * ReportQL: [base](https://huggingface.co/sarme/ReportQL-base)
102
+
103
+ ## License
104
+
105
+ Please see the [LICENSE](LICENSE) file for details.
106
+
107
+ ## Citation
108
+
109
+ If you find our work useful in your research, please consider citing us:
110
+
111
+ ```
112
+ @article{moezzi2022application,
113
+ title={Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique},
114
+ author={Moezzi, Seyed Ali Reza and Ghaedi, Abdolrahman and Rahmanian, Mojdeh and Mousavi, Seyedeh Zahra and Sami, Ashkan},
115
+ journal={Journal of Digital Imaging},
116
+ pages={1--11},
117
+ year={2022},
118
+ publisher={Springer}
119
+ }
120
+ ```