|
--- |
|
title: MHNfs |
|
emoji: 🔬 |
|
short_description: Activity prediction for low-data scenarios |
|
colorFrom: gray |
|
colorTo: gray |
|
sdk: streamlit |
|
sdk_version: 1.29.0 |
|
app_file: app.py |
|
pinned: true |
|
--- |
|
|
|
# Activity Predictions with MHNfs for low-data scenarios |
|
|
|
## ⚙️ Under the hood |
|
<div style="text-align: justify"> |
|
The predictive model (MHNfs) used in this application was specifically designed and |
|
trained for low-data scenarios. The model predicts whether a molecule is active or |
|
inactive. The predicted activity value is a continuous value between 0 and 1, and, |
|
similar to a probability, the higher/lower the value, the more confident the model |
|
is that the molecule is active/inactive.<br> |
|
<br> |
|
The model was trained on the FS-Mol dataset which |
|
includes 5120 tasks (roughly 5000 tasks were used for training, rest for evaluation). |
|
The training tasks are listed here: |
|
<a href="https://github.com/microsoft/FS-Mol/tree/main/datasets/targets" |
|
target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>. |
|
</div> |
|
|
|
## 🎯 About few-shot learning and the model MHNfs |
|
<div style="text-align: justify"> |
|
<b>Few-shot learning</b> is a machine learning sub-field which aims to provide |
|
predictive models for scenarios in which only little data is known/available.<br> |
|
<br> |
|
<b>MHNfs</b> is a few-shot learning model which is specifically designed for drug |
|
discovery applications. It is built to use the input prompts in a way such that |
|
the provided available knowledge, i.e. the known active and inactive molecules, |
|
functions as context to predict the activity of the new requested molecules. |
|
Precisely, the provided active and inactive molecules are associated with a |
|
large set of general molecules - called context molecules - to enrich the |
|
provided information and to remove spurious correlations arising from the |
|
decoration of molecules. This is analogous to a Large Language Model which would |
|
not only use the provided information in the current prompt as context but would |
|
also have access to way more information, e.g., a prompting history. |
|
</div> |
|
|
|
## 💻 Run the prediction pipeline locally for larger screening chunks |
|
|
|
### Get started: |
|
```bash |
|
# Copied from hugging face |
|
# Make sure you have git-lfs installed (https://git-lfs.com) |
|
git lfs install |
|
|
|
# Clone repo |
|
git clone https://huggingface.co/spaces/tschouis/mhnfs |
|
|
|
# Alternatively, if you want to clone without large files |
|
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/spaces/tschouis/mhnfs |
|
``` |
|
|
|
### Install requirements |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
Notably, this command was tested inside a conda environment with python 3.7. |
|
|
|
### Run the prediction pipeline: |
|
For your screening, load the model, i.e. the **Activity Predictor** into your python file or notebook and simply run it: |
|
```python |
|
from src.prediction_pipeline load ActivityPredictor |
|
|
|
# Define inputs |
|
query_smiles = ["C1CCCCC1", "C1CCCCC1", "C1CCCCC1", "C1CCCCC1"] # Replace with your data |
|
support_actives_smiles = ["C1CCCCC1", "C1CCCCC1"] # Replace with your data |
|
support_inactives_smiles = ["C1CCCCC1", "C1CCCCC1"] # Replace with your data |
|
|
|
# Make predictions |
|
predictions = predictor.predict(query_smiles, support_actives_smiles support_inactives_smiles) |
|
``` |
|
|
|
* Provide molecules in SMILES notation. |
|
* Make sure that the inputs to the Activity Predictor are either comma separated lists, or flattened numpy arrays, or pandas DataFrames. In the latter case, there should be a "smiles" column (both upper and lower case "SMILES" are accepted). All other columns are ignored. |
|
|
|
|
|
|
|
### Run the app locally with streamlib: |
|
```bash |
|
# Navigate into root directory of this project |
|
cd .../whatever_your_dir_name_is/ # Replace with your path |
|
|
|
# Run streamlit app |
|
python -m streamlit run |
|
``` |
|
|
|
## 📚 Cite us |
|
|
|
``` |
|
@inproceedings{ |
|
schimunek2023contextenriched, |
|
title={Context-enriched molecule representations improve few-shot drug discovery}, |
|
author={Johannes Schimunek and Philipp Seidl and Lukas Friedrich and Daniel Kuhn and Friedrich Rippmann and Sepp Hochreiter and Günter Klambauer}, |
|
booktitle={The Eleventh International Conference on Learning Representations}, |
|
year={2023}, |
|
url={https://openreview.net/forum?id=XrMWUuEevr} |
|
} |
|
``` |
|
|
|
|
|
|