Spaces:
Sleeping
Sleeping
ESM zero-shot variant prediction
this was inspired from this paper and adaptated from this repo
Instructions
- in the 'sequence' text box the protein full amino acid sequence that is to be analysed must be given, jolly charachters (e.g. -X.B) are supported (but at the moment the visualisation does not show the correct results)
- there's three running modes that can be chosen, depending on the input in the 'substitution' box:
- if another sequence is given, the positions that are different between the two will be evaluated (NB the sequences must be of the same length) and their score returned
- if a list of integers is given, a deep mutational scan will be performed at those positions in the input sequence and the scores for the amino acids, different from the original one, will be returned
- if a single substitution or a list thereof is given (in the form of B008S), the single substitution score is returned
- you can choose which ESM model to use for the calculations, these models are the ones that are available at runtime on Hugging Face Model Hub
- there's 2 scoring strategies available: wt-marginals and masked marginals; the first one is faster, but less accurate, the second one considers the sequence context more thoroughly, but is sensibly slower (the run time scales linearly with sequence length)
- the results will be shown in a table, with color coding and sorted by fitness (if performing a deep mutational scan)
- the output data is available for download from the box at the bottom as a CSV file