File size: 1,240 Bytes
2729347
 
 
 
7988e91
2729347
 
 
 
a17e71e
4b70583
 
2729347
4b70583
85dc257
 
4b70583
 
85dc257
337abc1
 
2729347
4b70583
 
2729347
4b70583
 
6dd6dd9
4b70583
2729347
4b70583
 
2729347
4b70583
2729347
4b70583
2729347
4b70583
2729347
4b70583
 
2729347
4b70583
2729347
4b70583
2729347
4b70583
2729347
4b70583
2729347
4b70583
2729347
4b70583
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is used to predict fitness of GB1 protein variants. 
### Task type
protein level regression

### Dataset description
The dataset is from:

Nicholas C Wu, Lei Dai, C Anders Olson, James O Lloyd-Smith, Ren Sun (2016) Adaptation in protein fitness landscapes is facilitated by indirect paths eLife 5:e16965
https://doi.org/10.7554/eLife.16965

Label is the fitness of mutant protein. The fitness of each variant can be viewed as the fitness relative to wildtype,
such that = 1. Therefore all labels are larger than 0, if label >1 means high fitness compare to wildtype.

### Model input type
Amino acid sequence

### Performance
test_spearman: 0.54

test_pearson: 0.98

### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-3

epoch: 20

batch size: 1000

precision: 16-mixed