File size: 2,967 Bytes
94127ad 4c8d3fa 603c70e 4c8d3fa 603c70e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
tags:
- protein
- small-molecule
- dti
- ibm
- mammal
- pytorch
- transformers
library_name: biomed
license: apache-2.0
base_model:
- ibm/biomed.omics.bl.sm.ma-ted-400m
---
Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery.
This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task.
Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the strength of the interaction between a small molecule (drug) and a protein (target).
The expected inputs for the model are the amino acid sequence of the target and the SMILES representation of the drug.
The benchmark used for fine-tuning defined on: `https://tdcommons.ai/multi_pred_tasks/dti/`
We also harmonize the values using data.harmonize_affinities(mode = 'max_affinity') and transforming to log-scale.
By default, we are using Drug+Target cold-split, as provided by tdcommons.
## Model Summary
- **Developers:** IBM Research
- **GitHub Repository:** https://github.com/BiomedSciAI/biomed-multi-alignment
- **Paper:** TBD
- **Release Date**: Oct 28th, 2024
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
## Usage
Using `ibm/biomed.omics.bl.sm.ma-ted-400m` requires installing [https://github.com/BiomedSciAI/biomed-multi-alignment](https://github.com/TBD)
```
pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git
```
A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-400m`:
```python
# Load Model
model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd")
# Load Tokenizer
tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd")
# convert to MAMMAL style
sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq}
sample_dict = DtiBindingdbKdTask.data_preprocessing(
sample_dict=sample_dict,
tokenizer_op=tokenizer_op,
target_sequence_key="target_seq",
drug_sequence_key="drug_seq",
norm_y_mean=None,
norm_y_std=None,
device=nn_model.device,
)
# forward pass - encoder_only mode which supports scalars predictions
batch_dict = nn_model.forward_encoder_only([sample_dict])
# Post-process the model's output
batch_dict = DtiBindingdbKdTask.process_model_output(
batch_dict,
scalars_preds_processed_key="model.out.dti_bindingdb_kd",
norm_y_mean=norm_y_mean,
norm_y_std=norm_y_std,
)
ans = {
"model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0])
}
# Print prediction
print(f"{ans=}")
```
For more advanced usage, see our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment`
## Citation
If you found our work useful, please consider to give a star to the repo and cite our paper:
```
@article{TBD,
title={TBD},
author={IBM Research Team},
jounal={arXiv preprint arXiv:TBD},
year={2024}
}
``` |