File size: 1,471 Bytes
0782659
 
 
 
 
 
 
 
e49c307
6b543de
 
 
0782659
6b543de
 
0782659
6b543de
ba0cb3c
cd93748
 
 
26b789a
0782659
6b543de
 
0782659
6b543de
 
0782659
ca01d39
6b543de
 
0782659
6b543de
0782659
6b543de
0782659
6b543de
0782659
6b543de
 
0782659
6b543de
0782659
6b543de
0782659
6b543de
0782659
6b543de
0782659
6b543de
0782659
6b543de
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is used to predict fitness of mutant β-subunit of tryptophan synthase (TrpB).
TrpB synthesizes L-tryptophan (Trp) from indole and L-serine (Ser).
TrpB variant Tm9D8*, derived from the hyperthermophile Thermotoga maritima, was selected as the parent enzyme.
Tm9D8* differs from wildtype TmTrpB by ten amino acid substitutions (P19G, E30G, I69V, K96L, P140L, N167D, I184F, L213P, G228S, and T292S).

### Task type
protein level regression

### Dataset description
The dataset is from [A combinatorially complete epistatic fitness landscape in an enzyme active site](https://www.biorxiv.org/content/10.1101/2024.06.23.600144v1).

The dataset can also be found at [SaProtHub dataset](https://huggingface.co/datasets/SaProtHub/TrpB_fitness_landsacpe_dataset).

Label means mutation fitness, here represents growth rate of E. coli strain. The maximum fitness is 1, the closer to 1, the better fitness.

### Model input type
Amino acid sequence

### Performance
test_pearson: 0.93

test_spearman: 0.38
### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 5e-4

epoch: 100

batch size: 100

precision: 16-mixed