selfies-ted
selfies-ted is an transformer based encoder decoder model for molecular representations using SELFIES.
Usage
Import
from transformers import AutoTokenizer, AutoModel
import selfies as sf
import torch
Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ibm/materials.selfies-ted")
model = AutoModel.from_pretrained("ibm/materials.selfies-ted")
Encode SMILES strings to selfies
smiles = "c1ccccc1"
selfies = sf.encoder(smiles)
selfies = selfies.replace("][", "] [")
Get embedding
token = tokenizer(selfies, return_tensors='pt', max_length=128, truncation=True, padding='max_length')
input_ids = token['input_ids']
attention_mask = token['attention_mask']
outputs = model.encoder(input_ids=input_ids, attention_mask=attention_mask)
model_output = outputs.last_hidden_state
input_mask_expanded = attention_mask.unsqueeze(-1).expand(model_output.size()).float()
sum_embeddings = torch.sum(model_output * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
model_output = sum_embeddings / sum_mask
- Downloads last month
- 707
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.