ibm
/

biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd

biomed-multi-alignment

small-molecules

single-cell-genes

drug-target-interaction

Model card Files Files and versions Community

biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd / README.md

SagiPolaczek's picture

Update README.md (#1)

603c70e verified 11 days ago

|

2.97 kB

	---
	tags:
	- protein
	- small-molecule
	- dti
	- ibm
	- mammal
	- pytorch
	- transformers
	library_name: biomed
	license: apache-2.0
	base_model:
	- ibm/biomed.omics.bl.sm.ma-ted-400m
	---

	Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery.
	This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task.
	Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the strength of the interaction between a small molecule (drug) and a protein (target).
	The expected inputs for the model are the amino acid sequence of the target and the SMILES representation of the drug.

	The benchmark used for fine-tuning defined on: `https://tdcommons.ai/multi_pred_tasks/dti/`
	We also harmonize the values using data.harmonize_affinities(mode = 'max_affinity') and transforming to log-scale.
	By default, we are using Drug+Target cold-split, as provided by tdcommons.


	## Model Summary

	- Developers: IBM Research
	- GitHub Repository: https://github.com/BiomedSciAI/biomed-multi-alignment
	- Paper: TBD
	- Release Date: Oct 28th, 2024
	- License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

	## Usage

	Using `ibm/biomed.omics.bl.sm.ma-ted-400m` requires installing [https://github.com/BiomedSciAI/biomed-multi-alignment](https://github.com/TBD)

	```
	pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git
	```

	A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-400m`:
	```python


	# Load Model
	model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd")

	# Load Tokenizer
	tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd")

	# convert to MAMMAL style
	sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq}
	sample_dict = DtiBindingdbKdTask.data_preprocessing(
	sample_dict=sample_dict,
	tokenizer_op=tokenizer_op,
	target_sequence_key="target_seq",
	drug_sequence_key="drug_seq",
	norm_y_mean=None,
	norm_y_std=None,
	device=nn_model.device,
	)

	# forward pass - encoder_only mode which supports scalars predictions
	batch_dict = nn_model.forward_encoder_only([sample_dict])

	# Post-process the model's output
	batch_dict = DtiBindingdbKdTask.process_model_output(
	batch_dict,
	scalars_preds_processed_key="model.out.dti_bindingdb_kd",
	norm_y_mean=norm_y_mean,
	norm_y_std=norm_y_std,
	)
	ans = {
	"model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0])
	}

	# Print prediction
	print(f"{ans=}")
	```

	For more advanced usage, see our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment`


	## Citation

	If you found our work useful, please consider to give a star to the repo and cite our paper:
	```
	@article{TBD,
	title={TBD},
	author={IBM Research Team},
	jounal={arXiv preprint arXiv:TBD},
	year={2024}
	}
	```