Spaces:

ml-jku
/

mhnfs

Runtime error

File size: 13,123 Bytes

"""
This file includes all the constant content shown in the app 
"""

# --------------------------------------------------------------------------------------

summary_text = ('''
                This application allows you to make **activity predictions** for 
                **biological targets** for which you have only a **little knowledge** in 
                terms of known active and inactive molecules.
                
                **Provide** via the sidebar:\n
                - some active molecules,
                - some inactive molecules, and
                - molecules you want to predict.
                
                Hit **Predict** and explore the predictions!
                
                For more **information** about the **model** and **how to provide the 
                molecules**, please visit the **Additional Information** tab.

                If you encounter any problems, we would be glad if you could report them
                to us: **[email protected]**.
                ''')

mhnfs_text =('''
        <div style="text-align: justify"> 
        <b>MHNfs</b> is a few-shot drug discovery model which consists of a <b>context 
        module</b> , a <b>cross-attention module</b> , and a <b>similarity module</b> 
        as described here: <a href="https://openreview.net/pdf?id=XrMWUuEevr" 
        target="_blank">https://openreview.net/pdf?id=XrMWUuEevr</a>.
        </div>    
        <br>

        <div style="text-align: justify"> 
        <b>Abstract</b>. A central task in computational drug discovery is to construct 
        models from known active molecules to find further promising molecules for 
        subsequent screening. However, typically only very few active molecules are 
        known. Therefore, few-shot learning methods have the potential to improve the 
        effectiveness of this critical phase of the drug discovery process. We introduce 
        a new method for few-shot drug discovery. Its main idea is to enrich a molecule 
        representation by knowledge about known context or reference molecules. Our 
        novel concept for molecule representation enrichment is to associate molecules 
        from both the support set and the query set with a large set of reference 
        (context) molecules through a modern Hopfield network. Intuitively, this 
        enrichment step is analogous to a human expert who would associate a given 
        molecule with familiar molecules whose properties are known. The enrichment step 
        reinforces and amplifies the covariance structure of the data, while 
        simultaneously removing spurious correlations arising from the decoration of 
        molecules. Our approach is compared with other few-shot methods for drug 
        discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms 
        all compared methods and therefore sets a new state-of-the art for few-shot 
        learning in drug discovery. An ablation study shows that the enrichment step of 
        our method is the key to improve the predictive quality. In a domain shift 
        experiment, we further demonstrate the robustness of our method. Code is 
        available at <a href="https://github.com/ml-jku/MHNfs" 
        target="_blank">https://github.com/ml-jku/MHNfs</a>.
        </div>
        <br>
        <br>
        ''')

citation_text = '''
        ### 
            @inproceedings{
                schimunek2023contextenriched,
                title={Context-enriched molecule representations improve few-shot drug discovery},
                author={Johannes Schimunek and Philipp Seidl and Lukas Friedrich and Daniel Kuhn and Friedrich Rippmann and Sepp Hochreiter and Günter 
                Klambauer},
                booktitle={The Eleventh International Conference on Learning Representations},
                year={2023},
                url={https://openreview.net/forum?id=XrMWUuEevr}
            }  
        '''

few_shot_learning_text = (
    '''
    <div style="text-align: justify">
    <b>Few-shot learning</b> is a machine learning sub-field which aims to provide 
    predictive models for scenarios in which only little data is known/available.<br>
    <br>
    
    <b>MHNfs</b> is a few-shot learning model which is specifically designed for drug
    discovery applications. It is built to use the input prompts in a way such that 
    the provided available knowledge, i.e. the known active and inactive molecules, 
    functions as context to predict the activity of the new requested molecules. 
    Precisely, the provided active and inactive molecules are associated with a
    large set of general molecules - called context molecules - to enrich the 
    provided information and to remove spurious correlations arising from the 
    decoration of molecules. This is analogous to a Large Language Model which would
    not only use the provided information in the current prompt as context but would
    also have access to way more information, e.g., a prompting history.
    </div>                          
    ''')

under_the_hood_text = ('''
    <div style="text-align: justify">
    The predictive model (MHNfs) used in this application was specifically designed and 
    trained for low-data scenarios. The model predicts whether a molecule is active or
    inactive. The predicted activity value is a continuous value between 0 and 1, and, 
    similar to a probability, the higher/lower the value, the more confident the model 
    is that the molecule is active/inactive.
    
    The model was trained on the FS-Mol dataset which 
    includes 5120 tasks (roughly 5000 tasks were used for training, rest for evaluation). 
    The training tasks are listed here:
    <a href="https://github.com/microsoft/FS-Mol/tree/main/datasets/targets" 
    target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>.
    </div>  
    ''')

usage_text = ('''
    <div style="text-align: justify">
    To use this application, you need to provide <b>3 different sets of molecules</b>:
    <ol>
        <li><b>active</b> molecules: set of known active molecules,</li>
        <li><b>inactive</b> molecules: set of known inactive molecules, and</li>
        <li>molecules to <b>predict</b>: set of molecules you want to predict.</li>
    </ol>
    These three sets can be provided via the <b>sidebar</b>. The sidebar also includes two
    buttons <b>predict</b> and <b>reset</b> to run the prediction pipeline and to 
    reset it.
    </div>
    ''')

data_text = ('''
    <div style="text-align: justify">
    <ul>
        <li> Molecules have to be provided in SMILES format</li>
        <li> For each input, the maximum number of molecules which can be provided is 
        restricted to 20 </li>
        <li> You can provide the molecules via the text boxes or via CSV upload
            <ul>
                <li> Text box
                    <ul>
                        <li> Replace the pseudo input by directly typing your molecules 
                        into 
                        the text box </li>
                        <li> Separate the molecules by comma </li>
                    </ul>
                </li>
                <li> CSV upload
                    <ul>
                        <li> The CSV file should include a "smiles" column (both upper 
                        and lower case "SMILES" are accepted) </li>
                        <li> All other columns will be ignored </li>
                        <li> Examples are provided here: 
                            <div style="background-color: #efefef">
                            assets/example_csv/ </li>
                            </div>
                    </ul>
                </li>
            </ul>
        </li>
    </ul>
    </div>
    ''')

trust_text = ('''
    <div style="text-align: justify">
    Just like all other machine learning models, the performance of MHNfs varies 
    and, generally, the model works well if the task is somehow close to tasks which 
    were used to train the model. The model performance for very different tasks is 
    unclear and might be poor.<br>
    <br>
        
    MHNfs was trained on the FS-Mol dataset which includes 5120 tasks (roughly 
    5000 tasks were used for training, rest for evaluation). The training tasks are 
    listed here: <a href= https://github.com/microsoft/FS-Mol/tree/main/datasets/targets
    target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>.
    </div>
    ''')

example_trustworthy_text = ('''
    <div style="text-align: justify">
    Since the predicitve model has seen a lot of kinase related tasks during training, 
    the model is expected to generally perform well on kinase targets. For this example, 
    we use data for the target 
    <a href=https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL5914/
    target="_blank">CHEMBL5914</a>. Notably, this specific kinase has not been seen 
    during training. Precisely, we use the available inhibition data while molecules 
    with an inhibition value greater (smaller) than 50 % are considered as active 
    (inactive).<br>
    
    From the known available data, we have selected 4 "known" active molecules, 
    8 "known" inactive molecules, and 11 molecules to predict.<br>
    
    <b>Molecules to predict</b>:
    <div style="background-color: #efefef">
    FC(F)(F)c1ccc(Cl)cc1CN1CCNc2ncc(-c3ccnc(N4CCNCC4)c3)cc21,<br>
    CS(=O)(=O)c1ccc(-n2nc(-c3cnc4[nH]ccc4c3)c3c(N)ncnc32)cc1,<br>
    O=C(Nc1ccccc1Cl)c1cnc2ccc(C3CCNCC3)cn12.O=C(O)C(=O)O,<br>
    CC(C)n1cnc2c(Nc3cccc(Cl)c3)nc(N[C@@H]3CCCC[C@@H]3N)nc21,<br>
    Nc1ncc(-c2ccc(NS(=O)(=O)C3CC3)cc2F)cc1-c1ccc2c(c1)CCNC2=O,<br>
    CCN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(C#Cc4cccnc4)c3)cc2C(F)(F)F)CC1,<br>
    CN1CCN(c2ccc(-c3cnc4c(c3)N(Cc3cc(Cl)ccc3C(F)(F)F)CCN4)cn2)CC1,<br>
    CC(C)n1nc(-c2cnc(N)c(OC(F)(F)F)c2)cc1[C@H]1[C@@H]2CN(C3COC3)C[C@@H]21,<br>
    Nc1ncc(-c2cc([C@H]3[C@@H]4CN(C5COC5)C[C@@H]43)n(CC3CC3)n2)cc1C(F)(F)F,<br>
    Cc1ccc(NC(=O)C2(C(=O)Nc3ccc(Nc4ncc(F)c(-c5cc(F)c6nc(C)n(C(C)C)c6c5)n4)cc3)CC2)cc1,<br>
    C[C@@H](Oc1cc(-c2cnn(C3CCNCC3)c2)cnc1N)c1c(Cl)ccc(F)c1Cl
    </div><br>
    
    <b>Known active molecules</b>:
    <div style="background-color: #efefef">
    CC(=O)N1CCN(c2cc(-c3cnc4c(c3)N(Cc3cc(Cl)ccc3C(F)(F)F)CCN4)ccn2)CC1,<br>
    CS(=O)(=O)c1cccc(Nc2nccc(-c3sc(N4CCOCC4)nc3-c3cccc(NS(=O)(=O)c4c(F)cccc4F)c3)n2)c1,<br>
    COc1cnccc1Nc1nc(-c2nn(Cc3c(F)cc(OCCO)cc3F)c3ccccc23)ncc1OC,<br>
    CN(C)[C@@H]1CC[C@@]2(C)[C@@H](CC[C@@H]3[C@@H]2CC[C@]2(C)C(c4cccc5cnccc45)=CC[C@@H]32)C1<br>
    </div><br>
    
    <b>Known inactive molecules</b>:
    <div style="background-color: #efefef">
    c1cc(-c2c[nH]c3cnccc23)ccn1,<br>
    COc1ccc2c3ccnc(C(F)(F)F)c3n(CCCCN)c2c1,<br>
    CNS(=O)(=O)c1ccc(N(C)C)c(Nc2ncnc3cc(OC)c(OC)cc23)c1,<br>
    CN(C1CC1)S(=O)(=O)c1ccc(-c2cnc(N)c(-c3ccc4c(c3)CCNC4=O)c2)c(F)c1,<br>
    CCN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(C#Cc4cnc5[nH]ccc5c4)c3)cc2C(F)(F)F)CC1,<br>
    CC(C)n1cc(-c2cc(-c3ccc(CN4CCOCC4)cc3)cnc2N)nn1,<br>
    CC(C)(O)[C@H](F)CN1Cc2cc(NC(=O)c3cnn4cccnc34)c(N3CCOCC3)cc2C1=O,<br>
    [2H]C([2H])([2H])C1(C([2H])([2H])[2H])Cn2nc(-c3ccc(F)cn3)c(-c3ccnc4[nH]ncc34)c2CO1<br>
    </div><br>
    
    <b>Predictions</b>:<br>
    
    </div> 
    ''')

example_nottrustworthy_text = ('''
    <div style="text-align: justify">
    For this example, we use data for the auxiliary transport protein target 
    <a href=https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL5738/
    target="_blank">CHEMBL5738</a>. Precisely, we use the available Ki data 
    while molecules with a pCHEMBL value greater (smaller) than 5 are considered 
    as active (inactive).<br>
    
    From the known available data, we have selected 4 "known" active molecules, 
    3 "known" inactive molecules, and 10 molecules to predict.<br>
    
    <b>Molecules to predict</b>:
    <div style="background-color: #efefef">
    CC(C(=O)O)c1ccc(-c2ccccc2)c(F)c1,<br>
    O=S(=O)(O)Oc1cccc2cccc(Nc3ccccc3)c12,<br>
    CCCCCCCC/C=C\CCCCCCCC(=O)O,<br>
    C[C@]12C=CC(=O)C=C1CC[C@@H]1[C@@H]2[C@@H](O)C[C@@]2(C)[C@H]1CC[C@]2(O)C(=O)CO,<br>
    CCOC(=O)C(C)(C)Oc1ccc(Cl)cc1,<br>
    Cc1ccc(Cl)c(Nc2ccccc2C(=O)O)c1Cl,<br>
    O=C(O)Cc1ccccc1Nc1c(Cl)cccc1Cl,<br>
    CC(C)(Oc1ccc(CCNC(=O)c2ccc(Cl)cc2)cc1)C(=O)O,<br>
    O=C(c1ccccc1)c1ccc2n1CCC2C(=O)O,<br>
    CC(C)OC(=O)C(C)(C)Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1<br>
    </div><br>
    
    <b>Known active molecules</b>:
    <div style="background-color: #efefef">
    CC(C(=O)O)c1ccc(N2Cc3ccccc3C2=O)cc1,<br>
    CN1C(=O)CN=C(c2ccccc2)c2cc(Cl)ccc21,<br>
    CC(C)(Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1)C(=O)O,<br>
    CC(=O)[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C

    </div><br>
    
    <b>Known inactive molecules</b>:
    <div style="background-color: #efefef">
    CC(C)Cc1ccc(C(C)C(=O)O)cc1,<br>
    O=C1Nc2ccc(Cl)cc2C(c2ccccc2Cl)=NC1O,<br>
    C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO
    </div><br>
    
    <b>Predictions</b>:<br>
    
    </div> 
    ''')