Spaces:

ml-jku
/

mhnfs

Runtime error

App Files Files Community

mhnfs / src /app /constants.py

Tschoui

Update src/app/constants.py

ebb90db verified 8 months ago

raw

history blame contribute delete

13.1 kB

	"""
	This file includes all the constant content shown in the app
	"""

	# --------------------------------------------------------------------------------------

	summary_text = ('''
	This application allows you to make activity predictions for
	biological targets for which you have only a little knowledge in
	terms of known active and inactive molecules.

	Provide via the sidebar:\n
	- some active molecules,
	- some inactive molecules, and
	- molecules you want to predict.

	Hit Predict and explore the predictions!

	For more information about the model and **how to provide the
	molecules, please visit the Additional Information** tab.

	If you encounter any problems, we would be glad if you could report them
	to us: [email protected].
	''')

	mhnfs_text =('''
	<div style="text-align: justify">
	<b>MHNfs</b> is a few-shot drug discovery model which consists of a <b>context
	module</b> , a <b>cross-attention module</b> , and a <b>similarity module</b>
	as described here: <a href="https://openreview.net/pdf?id=XrMWUuEevr"
	target="_blank">https://openreview.net/pdf?id=XrMWUuEevr</a>.
	</div>
	<br>

	<div style="text-align: justify">
	<b>Abstract</b>. A central task in computational drug discovery is to construct
	models from known active molecules to find further promising molecules for
	subsequent screening. However, typically only very few active molecules are
	known. Therefore, few-shot learning methods have the potential to improve the
	effectiveness of this critical phase of the drug discovery process. We introduce
	a new method for few-shot drug discovery. Its main idea is to enrich a molecule
	representation by knowledge about known context or reference molecules. Our
	novel concept for molecule representation enrichment is to associate molecules
	from both the support set and the query set with a large set of reference
	(context) molecules through a modern Hopfield network. Intuitively, this
	enrichment step is analogous to a human expert who would associate a given
	molecule with familiar molecules whose properties are known. The enrichment step
	reinforces and amplifies the covariance structure of the data, while
	simultaneously removing spurious correlations arising from the decoration of
	molecules. Our approach is compared with other few-shot methods for drug
	discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms
	all compared methods and therefore sets a new state-of-the art for few-shot
	learning in drug discovery. An ablation study shows that the enrichment step of
	our method is the key to improve the predictive quality. In a domain shift
	experiment, we further demonstrate the robustness of our method. Code is
	available at <a href="https://github.com/ml-jku/MHNfs"
	target="_blank">https://github.com/ml-jku/MHNfs</a>.
	</div>
	<br>
	<br>
	''')

	citation_text = '''
	###
	@inproceedings{
	schimunek2023contextenriched,
	title={Context-enriched molecule representations improve few-shot drug discovery},
	author={Johannes Schimunek and Philipp Seidl and Lukas Friedrich and Daniel Kuhn and Friedrich Rippmann and Sepp Hochreiter and Günter
	Klambauer},
	booktitle={The Eleventh International Conference on Learning Representations},
	year={2023},
	url={https://openreview.net/forum?id=XrMWUuEevr}
	}
	'''

	few_shot_learning_text = (
	'''
	<div style="text-align: justify">
	<b>Few-shot learning</b> is a machine learning sub-field which aims to provide
	predictive models for scenarios in which only little data is known/available.<br>
	<br>

	<b>MHNfs</b> is a few-shot learning model which is specifically designed for drug
	discovery applications. It is built to use the input prompts in a way such that
	the provided available knowledge, i.e. the known active and inactive molecules,
	functions as context to predict the activity of the new requested molecules.
	Precisely, the provided active and inactive molecules are associated with a
	large set of general molecules - called context molecules - to enrich the
	provided information and to remove spurious correlations arising from the
	decoration of molecules. This is analogous to a Large Language Model which would
	not only use the provided information in the current prompt as context but would
	also have access to way more information, e.g., a prompting history.
	</div>
	''')

	under_the_hood_text = ('''
	<div style="text-align: justify">
	The predictive model (MHNfs) used in this application was specifically designed and
	trained for low-data scenarios. The model predicts whether a molecule is active or
	inactive. The predicted activity value is a continuous value between 0 and 1, and,
	similar to a probability, the higher/lower the value, the more confident the model
	is that the molecule is active/inactive.

	The model was trained on the FS-Mol dataset which
	includes 5120 tasks (roughly 5000 tasks were used for training, rest for evaluation).
	The training tasks are listed here:
	<a href="https://github.com/microsoft/FS-Mol/tree/main/datasets/targets"
	target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>.
	</div>
	''')

	usage_text = ('''
	<div style="text-align: justify">
	To use this application, you need to provide <b>3 different sets of molecules</b>:
	<ol>
	<li><b>active</b> molecules: set of known active molecules,</li>
	<li><b>inactive</b> molecules: set of known inactive molecules, and</li>
	<li>molecules to <b>predict</b>: set of molecules you want to predict.</li>
	</ol>
	These three sets can be provided via the <b>sidebar</b>. The sidebar also includes two
	buttons <b>predict</b> and <b>reset</b> to run the prediction pipeline and to
	reset it.
	</div>
	''')

	data_text = ('''
	<div style="text-align: justify">
	<ul>
	<li> Molecules have to be provided in SMILES format</li>
	<li> For each input, the maximum number of molecules which can be provided is
	restricted to 20 </li>
	<li> You can provide the molecules via the text boxes or via CSV upload
	<ul>
	<li> Text box
	<ul>
	<li> Replace the pseudo input by directly typing your molecules
	into
	the text box </li>
	<li> Separate the molecules by comma </li>
	</ul>
	</li>
	<li> CSV upload
	<ul>
	<li> The CSV file should include a "smiles" column (both upper
	and lower case "SMILES" are accepted) </li>
	<li> All other columns will be ignored </li>
	<li> Examples are provided here:
	<div style="background-color: #efefef">
	assets/example_csv/ </li>
	</div>
	</ul>
	</li>
	</ul>
	</li>
	</ul>
	</div>
	''')

	trust_text = ('''
	<div style="text-align: justify">
	Just like all other machine learning models, the performance of MHNfs varies
	and, generally, the model works well if the task is somehow close to tasks which
	were used to train the model. The model performance for very different tasks is
	unclear and might be poor.<br>
	<br>

	MHNfs was trained on the FS-Mol dataset which includes 5120 tasks (roughly
	5000 tasks were used for training, rest for evaluation). The training tasks are
	listed here: <a href= https://github.com/microsoft/FS-Mol/tree/main/datasets/targets
	target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>.
	</div>
	''')

	example_trustworthy_text = ('''
	<div style="text-align: justify">
	Since the predicitve model has seen a lot of kinase related tasks during training,
	the model is expected to generally perform well on kinase targets. For this example,
	we use data for the target
	<a href=https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL5914/
	target="_blank">CHEMBL5914</a>. Notably, this specific kinase has not been seen
	during training. Precisely, we use the available inhibition data while molecules
	with an inhibition value greater (smaller) than 50 % are considered as active
	(inactive).<br>

	From the known available data, we have selected 4 "known" active molecules,
	8 "known" inactive molecules, and 11 molecules to predict.<br>

	<b>Molecules to predict</b>:
	<div style="background-color: #efefef">
	FC(F)(F)c1ccc(Cl)cc1CN1CCNc2ncc(-c3ccnc(N4CCNCC4)c3)cc21,<br>
	CS(=O)(=O)c1ccc(-n2nc(-c3cnc4[nH]ccc4c3)c3c(N)ncnc32)cc1,<br>
	O=C(Nc1ccccc1Cl)c1cnc2ccc(C3CCNCC3)cn12.O=C(O)C(=O)O,<br>
	CC(C)n1cnc2c(Nc3cccc(Cl)c3)nc(N[C@@H]3CCCC[C@@H]3N)nc21,<br>
	Nc1ncc(-c2ccc(NS(=O)(=O)C3CC3)cc2F)cc1-c1ccc2c(c1)CCNC2=O,<br>
	CCN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(C#Cc4cccnc4)c3)cc2C(F)(F)F)CC1,<br>
	CN1CCN(c2ccc(-c3cnc4c(c3)N(Cc3cc(Cl)ccc3C(F)(F)F)CCN4)cn2)CC1,<br>
	CC(C)n1nc(-c2cnc(N)c(OC(F)(F)F)c2)cc1[C@H]1[C@@H]2CN(C3COC3)C[C@@H]21,<br>
	Nc1ncc(-c2cc([C@H]3[C@@H]4CN(C5COC5)C[C@@H]43)n(CC3CC3)n2)cc1C(F)(F)F,<br>
	Cc1ccc(NC(=O)C2(C(=O)Nc3ccc(Nc4ncc(F)c(-c5cc(F)c6nc(C)n(C(C)C)c6c5)n4)cc3)CC2)cc1,<br>
	C[C@@H](Oc1cc(-c2cnn(C3CCNCC3)c2)cnc1N)c1c(Cl)ccc(F)c1Cl
	</div><br>

	<b>Known active molecules</b>:
	<div style="background-color: #efefef">
	CC(=O)N1CCN(c2cc(-c3cnc4c(c3)N(Cc3cc(Cl)ccc3C(F)(F)F)CCN4)ccn2)CC1,<br>
	CS(=O)(=O)c1cccc(Nc2nccc(-c3sc(N4CCOCC4)nc3-c3cccc(NS(=O)(=O)c4c(F)cccc4F)c3)n2)c1,<br>
	COc1cnccc1Nc1nc(-c2nn(Cc3c(F)cc(OCCO)cc3F)c3ccccc23)ncc1OC,<br>
	CN(C)[C@@H]1CC[C@@]2(C)[C@@H](CC[C@@H]3[C@@H]2CC[C@]2(C)C(c4cccc5cnccc45)=CC[C@@H]32)C1<br>
	</div><br>

	<b>Known inactive molecules</b>:
	<div style="background-color: #efefef">
	c1cc(-c2c[nH]c3cnccc23)ccn1,<br>
	COc1ccc2c3ccnc(C(F)(F)F)c3n(CCCCN)c2c1,<br>
	CNS(=O)(=O)c1ccc(N(C)C)c(Nc2ncnc3cc(OC)c(OC)cc23)c1,<br>
	CN(C1CC1)S(=O)(=O)c1ccc(-c2cnc(N)c(-c3ccc4c(c3)CCNC4=O)c2)c(F)c1,<br>
	CCN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(C#Cc4cnc5[nH]ccc5c4)c3)cc2C(F)(F)F)CC1,<br>
	CC(C)n1cc(-c2cc(-c3ccc(CN4CCOCC4)cc3)cnc2N)nn1,<br>
	CC(C)(O)[C@H](F)CN1Cc2cc(NC(=O)c3cnn4cccnc34)c(N3CCOCC3)cc2C1=O,<br>
	[2H]C([2H])([2H])C1(C([2H])([2H])[2H])Cn2nc(-c3ccc(F)cn3)c(-c3ccnc4[nH]ncc34)c2CO1<br>
	</div><br>

	<b>Predictions</b>:<br>

	</div>
	''')

	example_nottrustworthy_text = ('''
	<div style="text-align: justify">
	For this example, we use data for the auxiliary transport protein target
	<a href=https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL5738/
	target="_blank">CHEMBL5738</a>. Precisely, we use the available Ki data
	while molecules with a pCHEMBL value greater (smaller) than 5 are considered
	as active (inactive).<br>

	From the known available data, we have selected 4 "known" active molecules,
	3 "known" inactive molecules, and 10 molecules to predict.<br>

	<b>Molecules to predict</b>:
	<div style="background-color: #efefef">
	CC(C(=O)O)c1ccc(-c2ccccc2)c(F)c1,<br>
	O=S(=O)(O)Oc1cccc2cccc(Nc3ccccc3)c12,<br>
	CCCCCCCC/C=C\CCCCCCCC(=O)O,<br>
	C[C@]12C=CC(=O)C=C1CC[C@@H]1[C@@H]2[C@@H](O)C[C@@]2(C)[C@H]1CC[C@]2(O)C(=O)CO,<br>
	CCOC(=O)C(C)(C)Oc1ccc(Cl)cc1,<br>
	Cc1ccc(Cl)c(Nc2ccccc2C(=O)O)c1Cl,<br>
	O=C(O)Cc1ccccc1Nc1c(Cl)cccc1Cl,<br>
	CC(C)(Oc1ccc(CCNC(=O)c2ccc(Cl)cc2)cc1)C(=O)O,<br>
	O=C(c1ccccc1)c1ccc2n1CCC2C(=O)O,<br>
	CC(C)OC(=O)C(C)(C)Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1<br>
	</div><br>

	<b>Known active molecules</b>:
	<div style="background-color: #efefef">
	CC(C(=O)O)c1ccc(N2Cc3ccccc3C2=O)cc1,<br>
	CN1C(=O)CN=C(c2ccccc2)c2cc(Cl)ccc21,<br>
	CC(C)(Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1)C(=O)O,<br>
	CC(=O)[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C

	</div><br>

	<b>Known inactive molecules</b>:
	<div style="background-color: #efefef">
	CC(C)Cc1ccc(C(C)C(=O)O)cc1,<br>
	O=C1Nc2ccc(Cl)cc2C(c2ccccc2Cl)=NC1O,<br>
	C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO
	</div><br>

	<b>Predictions</b>:<br>

	</div>
	''')