Spaces:
Sleeping
Sleeping
# Copyright 2018-2022 Streamlit Inc. | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
# See the License for the specific language governing permissions and | |
# limitations under the License. | |
import streamlit as st | |
from streamlit.logger import get_logger | |
LOGGER = get_logger(__name__) | |
def run(): | |
st.set_page_config( | |
page_title="About WarmMolGen", | |
page_icon="🚀", | |
layout='wide' | |
) | |
st.write("## [Exploiting Pretrained Biochemical Language Models for Targeted Drug Design](https://arxiv.org/abs/2209.00981)") | |
#st.sidebar.title("Model Demos") | |
st.sidebar.success("Select a model demo above.") | |
st.markdown( | |
""" | |
This application demonstrates the generation capabilities of the models trained as part of the study below, which has been published in *Bioinformatics* Published by Oxford University Press. The available models are: | |
* WarmMolGen | |
- WarmMolGenOne (i.e. EncDecBase) | |
- WarmMolGenTwo (i.e. EncDecLM) | |
* ChemBERTaLM | |
👈 Select a model demo from the sidebar to generate molecules right away 🚀 | |
### Abstract | |
**Motivation:** The development of novel compounds targeting proteins of interest is one of the most important tasks in | |
the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown | |
promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein | |
language and the chemical language. However, such a model is limited by the availability of interacting protein–ligand | |
pairs. On the other hand, large amounts of unlabelled protein sequences and chemical compounds are available and | |
have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained | |
biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate | |
two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation | |
and (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target-specific training. We | |
also compare two decoding strategies to generate compounds: beam search and sampling. | |
**Results:** The results show that the warm-started models perform better than a baseline model trained from scratch. | |
The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics | |
from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests | |
that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that | |
beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound | |
quality. | |
**Availability and implementation:** The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials (i.e., data, models, and outputs) are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145. | |
### Citation | |
```bibtex | |
@article{10.1093/bioinformatics/btac482, | |
author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O and Karalı, Nilgün and Özgür, Arzucan}, | |
title = "{Exploiting pretrained biochemical language models for targeted drug design}", | |
journal = {Bioinformatics}, | |
volume = {38}, | |
number = {Supplement_2}, | |
pages = {ii155-ii161}, | |
year = {2022}, | |
doi = {10.1093/bioinformatics/btac482}, | |
url = {https://doi.org/10.1093/bioinformatics/btac482}, | |
} | |
``` | |
""" | |
) | |
# page_names_to_funcs = { | |
# "—": intro, | |
# "Plotting Demo": plotting_demo, | |
# "Mapping Demo": mapping_demo, | |
# "DataFrame Demo": data_frame_demo | |
# } | |
# demo_name = st.sidebar.selectbox("Choose a demo", page_names_to_funcs.keys()) | |
# page_names_to_funcs[demo_name]() | |
if __name__ == "__main__": | |
run() |