|
""" |
|
This file includes all the constant content shown in the app |
|
""" |
|
|
|
|
|
|
|
summary_text = (''' |
|
This application allows you to make **activity predictions** for |
|
**biological targets** for which you have only a **little knowledge** in |
|
terms of known active and inactive molecules. |
|
|
|
**Provide** via the sidebar:\n |
|
- some active molecules, |
|
- some inactive molecules, and |
|
- molecules you want to predict. |
|
|
|
Hit **Predict** and explore the predictions! |
|
|
|
For more **information** about the **model** and **how to provide the |
|
molecules**, please visit the **Additional Information** tab. |
|
|
|
If you encounter any problems, we would be glad if you could report them |
|
to us: **[email protected]**. |
|
''') |
|
|
|
mhnfs_text =(''' |
|
<div style="text-align: justify"> |
|
<b>MHNfs</b> is a few-shot drug discovery model which consists of a <b>context |
|
module</b> , a <b>cross-attention module</b> , and a <b>similarity module</b> |
|
as described here: <a href="https://openreview.net/pdf?id=XrMWUuEevr" |
|
target="_blank">https://openreview.net/pdf?id=XrMWUuEevr</a>. |
|
</div> |
|
<br> |
|
|
|
<div style="text-align: justify"> |
|
<b>Abstract</b>. A central task in computational drug discovery is to construct |
|
models from known active molecules to find further promising molecules for |
|
subsequent screening. However, typically only very few active molecules are |
|
known. Therefore, few-shot learning methods have the potential to improve the |
|
effectiveness of this critical phase of the drug discovery process. We introduce |
|
a new method for few-shot drug discovery. Its main idea is to enrich a molecule |
|
representation by knowledge about known context or reference molecules. Our |
|
novel concept for molecule representation enrichment is to associate molecules |
|
from both the support set and the query set with a large set of reference |
|
(context) molecules through a modern Hopfield network. Intuitively, this |
|
enrichment step is analogous to a human expert who would associate a given |
|
molecule with familiar molecules whose properties are known. The enrichment step |
|
reinforces and amplifies the covariance structure of the data, while |
|
simultaneously removing spurious correlations arising from the decoration of |
|
molecules. Our approach is compared with other few-shot methods for drug |
|
discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms |
|
all compared methods and therefore sets a new state-of-the art for few-shot |
|
learning in drug discovery. An ablation study shows that the enrichment step of |
|
our method is the key to improve the predictive quality. In a domain shift |
|
experiment, we further demonstrate the robustness of our method. Code is |
|
available at <a href="https://github.com/ml-jku/MHNfs" |
|
target="_blank">https://github.com/ml-jku/MHNfs</a>. |
|
</div> |
|
<br> |
|
<br> |
|
''') |
|
|
|
citation_text = ''' |
|
### |
|
@inproceedings{ |
|
schimunek2023contextenriched, |
|
title={Context-enriched molecule representations improve few-shot drug discovery}, |
|
author={Johannes Schimunek and Philipp Seidl and Lukas Friedrich and Daniel Kuhn and Friedrich Rippmann and Sepp Hochreiter and Günter |
|
Klambauer}, |
|
booktitle={The Eleventh International Conference on Learning Representations}, |
|
year={2023}, |
|
url={https://openreview.net/forum?id=XrMWUuEevr} |
|
} |
|
''' |
|
|
|
few_shot_learning_text = ( |
|
''' |
|
<div style="text-align: justify"> |
|
<b>Few-shot learning</b> is a machine learning sub-field which aims to provide |
|
predictive models for scenarios in which only little data is known/available.<br> |
|
<br> |
|
|
|
<b>MHNfs</b> is a few-shot learning model which is specifically designed for drug |
|
discovery applications. It is built to use the input prompts in a way such that |
|
the provided available knowledge, i.e. the known active and inactive molecules, |
|
functions as context to predict the activity of the new requested molecules. |
|
Precisely, the provided active and inactive molecules are associated with a |
|
large set of general molecules - called context molecules - to enrich the |
|
provided information and to remove spurious correlations arising from the |
|
decoration of molecules. This is analogous to a Large Language Model which would |
|
not only use the provided information in the current prompt as context but would |
|
also have access to way more information, e.g., a prompting history. |
|
</div> |
|
''') |
|
|
|
under_the_hood_text = (''' |
|
<div style="text-align: justify"> |
|
The predictive model (MHNfs) used in this application was specifically designed and |
|
trained for low-data scenarios. The model predicts whether a molecule is active or |
|
inactive. The predicted activity value is a continuous value between 0 and 1, and, |
|
similar to a probability, the higher/lower the value, the more confident the model |
|
is that the molecule is active/inactive. |
|
|
|
The model was trained on the FS-Mol dataset which |
|
includes 5120 tasks (roughly 5000 tasks were used for training, rest for evaluation). |
|
The training tasks are listed here: |
|
<a href="https://github.com/microsoft/FS-Mol/tree/main/datasets/targets" |
|
target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>. |
|
</div> |
|
''') |
|
|
|
usage_text = (''' |
|
<div style="text-align: justify"> |
|
To use this application, you need to provide <b>3 different sets of molecules</b>: |
|
<ol> |
|
<li><b>active</b> molecules: set of known active molecules,</li> |
|
<li><b>inactive</b> molecules: set of known inactive molecules, and</li> |
|
<li>molecules to <b>predict</b>: set of molecules you want to predict.</li> |
|
</ol> |
|
These three sets can be provided via the <b>sidebar</b>. The sidebar also includes two |
|
buttons <b>predict</b> and <b>reset</b> to run the prediction pipeline and to |
|
reset it. |
|
</div> |
|
''') |
|
|
|
data_text = (''' |
|
<div style="text-align: justify"> |
|
<ul> |
|
<li> Molecules have to be provided in SMILES format</li> |
|
<li> For each input, the maximum number of molecules which can be provided is |
|
restricted to 20 </li> |
|
<li> You can provide the molecules via the text boxes or via CSV upload |
|
<ul> |
|
<li> Text box |
|
<ul> |
|
<li> Replace the pseudo input by directly typing your molecules |
|
into |
|
the text box </li> |
|
<li> Separate the molecules by comma </li> |
|
</ul> |
|
</li> |
|
<li> CSV upload |
|
<ul> |
|
<li> The CSV file should include a "smiles" column (both upper |
|
and lower case "SMILES" are accepted) </li> |
|
<li> All other columns will be ignored </li> |
|
<li> Examples are provided here: |
|
<div style="background-color: #efefef"> |
|
assets/example_csv/ </li> |
|
</div> |
|
</ul> |
|
</li> |
|
</ul> |
|
</li> |
|
</ul> |
|
</div> |
|
''') |
|
|
|
trust_text = (''' |
|
<div style="text-align: justify"> |
|
Just like all other machine learning models, the performance of MHNfs varies |
|
and, generally, the model works well if the task is somehow close to tasks which |
|
were used to train the model. The model performance for very different tasks is |
|
unclear and might be poor.<br> |
|
<br> |
|
|
|
MHNfs was trained on the FS-Mol dataset which includes 5120 tasks (roughly |
|
5000 tasks were used for training, rest for evaluation). The training tasks are |
|
listed here: <a href= https://github.com/microsoft/FS-Mol/tree/main/datasets/targets |
|
target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>. |
|
</div> |
|
''') |
|
|
|
example_trustworthy_text = (''' |
|
<div style="text-align: justify"> |
|
Since the predicitve model has seen a lot of kinase related tasks during training, |
|
the model is expected to generally perform well on kinase targets. For this example, |
|
we use data for the target |
|
<a href=https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL5914/ |
|
target="_blank">CHEMBL5914</a>. Notably, this specific kinase has not been seen |
|
during training. Precisely, we use the available inhibition data while molecules |
|
with an inhibition value greater (smaller) than 50 % are considered as active |
|
(inactive).<br> |
|
|
|
From the known available data, we have selected 4 "known" active molecules, |
|
8 "known" inactive molecules, and 11 molecules to predict.<br> |
|
|
|
<b>Molecules to predict</b>: |
|
<div style="background-color: #efefef"> |
|
FC(F)(F)c1ccc(Cl)cc1CN1CCNc2ncc(-c3ccnc(N4CCNCC4)c3)cc21,<br> |
|
CS(=O)(=O)c1ccc(-n2nc(-c3cnc4[nH]ccc4c3)c3c(N)ncnc32)cc1,<br> |
|
O=C(Nc1ccccc1Cl)c1cnc2ccc(C3CCNCC3)cn12.O=C(O)C(=O)O,<br> |
|
CC(C)n1cnc2c(Nc3cccc(Cl)c3)nc(N[C@@H]3CCCC[C@@H]3N)nc21,<br> |
|
Nc1ncc(-c2ccc(NS(=O)(=O)C3CC3)cc2F)cc1-c1ccc2c(c1)CCNC2=O,<br> |
|
CCN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(C#Cc4cccnc4)c3)cc2C(F)(F)F)CC1,<br> |
|
CN1CCN(c2ccc(-c3cnc4c(c3)N(Cc3cc(Cl)ccc3C(F)(F)F)CCN4)cn2)CC1,<br> |
|
CC(C)n1nc(-c2cnc(N)c(OC(F)(F)F)c2)cc1[C@H]1[C@@H]2CN(C3COC3)C[C@@H]21,<br> |
|
Nc1ncc(-c2cc([C@H]3[C@@H]4CN(C5COC5)C[C@@H]43)n(CC3CC3)n2)cc1C(F)(F)F,<br> |
|
Cc1ccc(NC(=O)C2(C(=O)Nc3ccc(Nc4ncc(F)c(-c5cc(F)c6nc(C)n(C(C)C)c6c5)n4)cc3)CC2)cc1,<br> |
|
C[C@@H](Oc1cc(-c2cnn(C3CCNCC3)c2)cnc1N)c1c(Cl)ccc(F)c1Cl |
|
</div><br> |
|
|
|
<b>Known active molecules</b>: |
|
<div style="background-color: #efefef"> |
|
CC(=O)N1CCN(c2cc(-c3cnc4c(c3)N(Cc3cc(Cl)ccc3C(F)(F)F)CCN4)ccn2)CC1,<br> |
|
CS(=O)(=O)c1cccc(Nc2nccc(-c3sc(N4CCOCC4)nc3-c3cccc(NS(=O)(=O)c4c(F)cccc4F)c3)n2)c1,<br> |
|
COc1cnccc1Nc1nc(-c2nn(Cc3c(F)cc(OCCO)cc3F)c3ccccc23)ncc1OC,<br> |
|
CN(C)[C@@H]1CC[C@@]2(C)[C@@H](CC[C@@H]3[C@@H]2CC[C@]2(C)C(c4cccc5cnccc45)=CC[C@@H]32)C1<br> |
|
</div><br> |
|
|
|
<b>Known inactive molecules</b>: |
|
<div style="background-color: #efefef"> |
|
c1cc(-c2c[nH]c3cnccc23)ccn1,<br> |
|
COc1ccc2c3ccnc(C(F)(F)F)c3n(CCCCN)c2c1,<br> |
|
CNS(=O)(=O)c1ccc(N(C)C)c(Nc2ncnc3cc(OC)c(OC)cc23)c1,<br> |
|
CN(C1CC1)S(=O)(=O)c1ccc(-c2cnc(N)c(-c3ccc4c(c3)CCNC4=O)c2)c(F)c1,<br> |
|
CCN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(C#Cc4cnc5[nH]ccc5c4)c3)cc2C(F)(F)F)CC1,<br> |
|
CC(C)n1cc(-c2cc(-c3ccc(CN4CCOCC4)cc3)cnc2N)nn1,<br> |
|
CC(C)(O)[C@H](F)CN1Cc2cc(NC(=O)c3cnn4cccnc34)c(N3CCOCC3)cc2C1=O,<br> |
|
[2H]C([2H])([2H])C1(C([2H])([2H])[2H])Cn2nc(-c3ccc(F)cn3)c(-c3ccnc4[nH]ncc34)c2CO1<br> |
|
</div><br> |
|
|
|
<b>Predictions</b>:<br> |
|
|
|
</div> |
|
''') |
|
|
|
example_nottrustworthy_text = (''' |
|
<div style="text-align: justify"> |
|
For this example, we use data for the auxiliary transport protein target |
|
<a href=https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL5738/ |
|
target="_blank">CHEMBL5738</a>. Precisely, we use the available Ki data |
|
while molecules with a pCHEMBL value greater (smaller) than 5 are considered |
|
as active (inactive).<br> |
|
|
|
From the known available data, we have selected 4 "known" active molecules, |
|
3 "known" inactive molecules, and 10 molecules to predict.<br> |
|
|
|
<b>Molecules to predict</b>: |
|
<div style="background-color: #efefef"> |
|
CC(C(=O)O)c1ccc(-c2ccccc2)c(F)c1,<br> |
|
O=S(=O)(O)Oc1cccc2cccc(Nc3ccccc3)c12,<br> |
|
CCCCCCCC/C=C\CCCCCCCC(=O)O,<br> |
|
C[C@]12C=CC(=O)C=C1CC[C@@H]1[C@@H]2[C@@H](O)C[C@@]2(C)[C@H]1CC[C@]2(O)C(=O)CO,<br> |
|
CCOC(=O)C(C)(C)Oc1ccc(Cl)cc1,<br> |
|
Cc1ccc(Cl)c(Nc2ccccc2C(=O)O)c1Cl,<br> |
|
O=C(O)Cc1ccccc1Nc1c(Cl)cccc1Cl,<br> |
|
CC(C)(Oc1ccc(CCNC(=O)c2ccc(Cl)cc2)cc1)C(=O)O,<br> |
|
O=C(c1ccccc1)c1ccc2n1CCC2C(=O)O,<br> |
|
CC(C)OC(=O)C(C)(C)Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1<br> |
|
</div><br> |
|
|
|
<b>Known active molecules</b>: |
|
<div style="background-color: #efefef"> |
|
CC(C(=O)O)c1ccc(N2Cc3ccccc3C2=O)cc1,<br> |
|
CN1C(=O)CN=C(c2ccccc2)c2cc(Cl)ccc21,<br> |
|
CC(C)(Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1)C(=O)O,<br> |
|
CC(=O)[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C |
|
|
|
</div><br> |
|
|
|
<b>Known inactive molecules</b>: |
|
<div style="background-color: #efefef"> |
|
CC(C)Cc1ccc(C(C)C(=O)O)cc1,<br> |
|
O=C1Nc2ccc(Cl)cc2C(c2ccccc2Cl)=NC1O,<br> |
|
C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO |
|
</div><br> |
|
|
|
<b>Predictions</b>:<br> |
|
|
|
</div> |
|
''') |