zsp / instructions.md
MassimoGregorioTotaro
general reorganisation
2dd6312
|
raw
history blame
1.66 kB
# **ESM zero-shot variant prediction**
this was inspired from this [paper](https://doi.org/10.1101/2021.07.09.450648) and adaptated from [this repo](https://github.com/facebookresearch/esm/tree/main/esm)
#### **Instructions**
- in the 'sequence' text box the protein full amino acid sequence that is to be analysed must be given, jolly charachters (e.g. -X.B) are supported (but at the moment the visualisation does not show the correct results)
- there's three running modes that can be chosen, depending on the input in the 'substitution' box:
- if another sequence is given, the positions that are different between the two will be evaluated (NB the sequences must be of the same length) and their score returned
- if a list of integers is given, a deep mutational scan will be performed at those positions in the input sequence and the scores for the amino acids, different from the original one, will be returned
- if a single substitution or a list thereof is given (in the form of **B008S**), the single substitution score is returned
- you can choose which ESM model to use for the calculations, these models are the ones that are available at runtime on Hugging Face Model Hub
- there's 2 scoring strategies available: wt-marginals and masked marginals; the first one is faster, but less accurate, the second one considers the sequence context more thoroughly, but is sensibly slower (the run time scales linearly with sequence length)
- the results will be shown in a table, with color coding and sorted by fitness (if performing a deep mutational scan)
- the output data is available for download from the box at the bottom as a CSV file