Spaces:
Sleeping
ESM-Scan
Calculate the fitness of single amino acid substitutions on proteins, using a zero-shot language model predictor
If you use this tool in your research, please cite:
- Totaro, M.G. (2023). “ESM-Scan - a tool to guide amino acid substitutions.” bioRxiv. doi.org/10.1101/2023.12.12.571273
- Meier, J. (2021). “Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv (Cold Spring Harbor Laboratory), July. doi.org/10.1101/2021.07.09.450648
USAGE INSTRUCTIONS
Setup
No setup is required, just fill the input boxes with the required data and click on the Run
button.
A list of examples can be found at the bottom of the page, click on them to autofill the fields.
If the server is not used for some time, it will go into standby.
Running a calculation resumes the tool from standby, the first run might take longer due to startup and model loading.
Input
- write the protein full amino acid sequence to be analysed in the Sequence text box
jolly charachters (e.g.-X.B
) can be inserted but, at the moment, visualisation cannot handle them - write the substitutions to test in the Substitutions box
there are three running modes that can be used, depending on the input:- single substitution or list thereof (in the form of
R218K R218W
): the single substitution is scored - residue position or list thereof: all possible substitutions will be evaluated
- same-length sequence: the differing amino acid substitutions will be evaluated, one by one
- any other different input: a deep mutational scan of the full sequence will be performed
- single substitution or list thereof (in the form of
- the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
esm2_t33_650M_UR50D
offers the best expense-accuracy tradeoff* - the more accurate
masked-marginals
scoring strategy considers sequence context during inferences, increasing the runtime significantly; if the wait is too long, you can tick the box off to speed the calculations, sacrificing accuracy - when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
in general, accuracy is influenced more by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length - it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again
Output
Your results will be shown in a color-coded table, except for the deep mutational scan which will yield a heatmap.
The output data can be downloaded from the box at the bottom.
File extensions are not supported by the server and need to be appended to the filenames after downloading:
CSV
for tablesSVG
for full-sequence deep mutational scan