gravelcompbio commited on
Commit
f04feb9
1 Parent(s): 6ced301

Upload 4 files

Browse files
LICENSE ADDED
@@ -0,0 +1,402 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Attribution-NonCommercial-NoDerivatives 4.0 International
2
+
3
+ =======================================================================
4
+
5
+ Creative Commons Corporation ("Creative Commons") is not a law firm and
6
+ does not provide legal services or legal advice. Distribution of
7
+ Creative Commons public licenses does not create a lawyer-client or
8
+ other relationship. Creative Commons makes its licenses and related
9
+ information available on an "as-is" basis. Creative Commons gives no
10
+ warranties regarding its licenses, any material licensed under their
11
+ terms and conditions, or any related information. Creative Commons
12
+ disclaims all liability for damages resulting from their use to the
13
+ fullest extent possible.
14
+
15
+ Using Creative Commons Public Licenses
16
+
17
+ Creative Commons public licenses provide a standard set of terms and
18
+ conditions that creators and other rights holders may use to share
19
+ original works of authorship and other material subject to copyright
20
+ and certain other rights specified in the public license below. The
21
+ following considerations are for informational purposes only, are not
22
+ exhaustive, and do not form part of our licenses.
23
+
24
+ Considerations for licensors: Our public licenses are
25
+ intended for use by those authorized to give the public
26
+ permission to use material in ways otherwise restricted by
27
+ copyright and certain other rights. Our licenses are
28
+ irrevocable. Licensors should read and understand the terms
29
+ and conditions of the license they choose before applying it.
30
+ Licensors should also secure all rights necessary before
31
+ applying our licenses so that the public can reuse the
32
+ material as expected. Licensors should clearly mark any
33
+ material not subject to the license. This includes other CC-
34
+ licensed material, or material used under an exception or
35
+ limitation to copyright. More considerations for licensors:
36
+ wiki.creativecommons.org/Considerations_for_licensors
37
+
38
+ Considerations for the public: By using one of our public
39
+ licenses, a licensor grants the public permission to use the
40
+ licensed material under specified terms and conditions. If
41
+ the licensor's permission is not necessary for any reason--for
42
+ example, because of any applicable exception or limitation to
43
+ copyright--then that use is not regulated by the license. Our
44
+ licenses grant only permissions under copyright and certain
45
+ other rights that a licensor has authority to grant. Use of
46
+ the licensed material may still be restricted for other
47
+ reasons, including because others have copyright or other
48
+ rights in the material. A licensor may make special requests,
49
+ such as asking that all changes be marked or described.
50
+ Although not required by our licenses, you are encouraged to
51
+ respect those requests where reasonable. More considerations
52
+ for the public:
53
+ wiki.creativecommons.org/Considerations_for_licensees
54
+
55
+ =======================================================================
56
+
57
+ Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
58
+ International Public License
59
+
60
+ By exercising the Licensed Rights (defined below), You accept and agree
61
+ to be bound by the terms and conditions of this Creative Commons
62
+ Attribution-NonCommercial-NoDerivatives 4.0 International Public
63
+ License ("Public License"). To the extent this Public License may be
64
+ interpreted as a contract, You are granted the Licensed Rights in
65
+ consideration of Your acceptance of these terms and conditions, and the
66
+ Licensor grants You such rights in consideration of benefits the
67
+ Licensor receives from making the Licensed Material available under
68
+ these terms and conditions.
69
+
70
+
71
+ Section 1 -- Definitions.
72
+
73
+ a. Adapted Material means material subject to Copyright and Similar
74
+ Rights that is derived from or based upon the Licensed Material
75
+ and in which the Licensed Material is translated, altered,
76
+ arranged, transformed, or otherwise modified in a manner requiring
77
+ permission under the Copyright and Similar Rights held by the
78
+ Licensor. For purposes of this Public License, where the Licensed
79
+ Material is a musical work, performance, or sound recording,
80
+ Adapted Material is always produced where the Licensed Material is
81
+ synched in timed relation with a moving image.
82
+
83
+ b. Copyright and Similar Rights means copyright and/or similar rights
84
+ closely related to copyright including, without limitation,
85
+ performance, broadcast, sound recording, and Sui Generis Database
86
+ Rights, without regard to how the rights are labeled or
87
+ categorized. For purposes of this Public License, the rights
88
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
89
+ Rights.
90
+
91
+ c. Effective Technological Measures means those measures that, in the
92
+ absence of proper authority, may not be circumvented under laws
93
+ fulfilling obligations under Article 11 of the WIPO Copyright
94
+ Treaty adopted on December 20, 1996, and/or similar international
95
+ agreements.
96
+
97
+ d. Exceptions and Limitations means fair use, fair dealing, and/or
98
+ any other exception or limitation to Copyright and Similar Rights
99
+ that applies to Your use of the Licensed Material.
100
+
101
+ e. Licensed Material means the artistic or literary work, database,
102
+ or other material to which the Licensor applied this Public
103
+ License.
104
+
105
+ f. Licensed Rights means the rights granted to You subject to the
106
+ terms and conditions of this Public License, which are limited to
107
+ all Copyright and Similar Rights that apply to Your use of the
108
+ Licensed Material and that the Licensor has authority to license.
109
+
110
+ g. Licensor means the individual(s) or entity(ies) granting rights
111
+ under this Public License.
112
+
113
+ h. NonCommercial means not primarily intended for or directed towards
114
+ commercial advantage or monetary compensation. For purposes of
115
+ this Public License, the exchange of the Licensed Material for
116
+ other material subject to Copyright and Similar Rights by digital
117
+ file-sharing or similar means is NonCommercial provided there is
118
+ no payment of monetary compensation in connection with the
119
+ exchange.
120
+
121
+ i. Share means to provide material to the public by any means or
122
+ process that requires permission under the Licensed Rights, such
123
+ as reproduction, public display, public performance, distribution,
124
+ dissemination, communication, or importation, and to make material
125
+ available to the public including in ways that members of the
126
+ public may access the material from a place and at a time
127
+ individually chosen by them.
128
+
129
+ j. Sui Generis Database Rights means rights other than copyright
130
+ resulting from Directive 96/9/EC of the European Parliament and of
131
+ the Council of 11 March 1996 on the legal protection of databases,
132
+ as amended and/or succeeded, as well as other essentially
133
+ equivalent rights anywhere in the world.
134
+
135
+ k. You means the individual or entity exercising the Licensed Rights
136
+ under this Public License. Your has a corresponding meaning.
137
+
138
+
139
+ Section 2 -- Scope.
140
+
141
+ a. License grant.
142
+
143
+ 1. Subject to the terms and conditions of this Public License,
144
+ the Licensor hereby grants You a worldwide, royalty-free,
145
+ non-sublicensable, non-exclusive, irrevocable license to
146
+ exercise the Licensed Rights in the Licensed Material to:
147
+
148
+ a. reproduce and Share the Licensed Material, in whole or
149
+ in part, for NonCommercial purposes only; and
150
+
151
+ b. produce and reproduce, but not Share, Adapted Material
152
+ for NonCommercial purposes only.
153
+
154
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
155
+ Exceptions and Limitations apply to Your use, this Public
156
+ License does not apply, and You do not need to comply with
157
+ its terms and conditions.
158
+
159
+ 3. Term. The term of this Public License is specified in Section
160
+ 6(a).
161
+
162
+ 4. Media and formats; technical modifications allowed. The
163
+ Licensor authorizes You to exercise the Licensed Rights in
164
+ all media and formats whether now known or hereafter created,
165
+ and to make technical modifications necessary to do so. The
166
+ Licensor waives and/or agrees not to assert any right or
167
+ authority to forbid You from making technical modifications
168
+ necessary to exercise the Licensed Rights, including
169
+ technical modifications necessary to circumvent Effective
170
+ Technological Measures. For purposes of this Public License,
171
+ simply making modifications authorized by this Section 2(a)
172
+ (4) never produces Adapted Material.
173
+
174
+ 5. Downstream recipients.
175
+
176
+ a. Offer from the Licensor -- Licensed Material. Every
177
+ recipient of the Licensed Material automatically
178
+ receives an offer from the Licensor to exercise the
179
+ Licensed Rights under the terms and conditions of this
180
+ Public License.
181
+
182
+ b. No downstream restrictions. You may not offer or impose
183
+ any additional or different terms or conditions on, or
184
+ apply any Effective Technological Measures to, the
185
+ Licensed Material if doing so restricts exercise of the
186
+ Licensed Rights by any recipient of the Licensed
187
+ Material.
188
+
189
+ 6. No endorsement. Nothing in this Public License constitutes or
190
+ may be construed as permission to assert or imply that You
191
+ are, or that Your use of the Licensed Material is, connected
192
+ with, or sponsored, endorsed, or granted official status by,
193
+ the Licensor or others designated to receive attribution as
194
+ provided in Section 3(a)(1)(A)(i).
195
+
196
+ b. Other rights.
197
+
198
+ 1. Moral rights, such as the right of integrity, are not
199
+ licensed under this Public License, nor are publicity,
200
+ privacy, and/or other similar personality rights; however, to
201
+ the extent possible, the Licensor waives and/or agrees not to
202
+ assert any such rights held by the Licensor to the limited
203
+ extent necessary to allow You to exercise the Licensed
204
+ Rights, but not otherwise.
205
+
206
+ 2. Patent and trademark rights are not licensed under this
207
+ Public License.
208
+
209
+ 3. To the extent possible, the Licensor waives any right to
210
+ collect royalties from You for the exercise of the Licensed
211
+ Rights, whether directly or through a collecting society
212
+ under any voluntary or waivable statutory or compulsory
213
+ licensing scheme. In all other cases the Licensor expressly
214
+ reserves any right to collect such royalties, including when
215
+ the Licensed Material is used other than for NonCommercial
216
+ purposes.
217
+
218
+
219
+ Section 3 -- License Conditions.
220
+
221
+ Your exercise of the Licensed Rights is expressly made subject to the
222
+ following conditions.
223
+
224
+ a. Attribution.
225
+
226
+ 1. If You Share the Licensed Material, You must:
227
+
228
+ a. retain the following if it is supplied by the Licensor
229
+ with the Licensed Material:
230
+
231
+ i. identification of the creator(s) of the Licensed
232
+ Material and any others designated to receive
233
+ attribution, in any reasonable manner requested by
234
+ the Licensor (including by pseudonym if
235
+ designated);
236
+
237
+ ii. a copyright notice;
238
+
239
+ iii. a notice that refers to this Public License;
240
+
241
+ iv. a notice that refers to the disclaimer of
242
+ warranties;
243
+
244
+ v. a URI or hyperlink to the Licensed Material to the
245
+ extent reasonably practicable;
246
+
247
+ b. indicate if You modified the Licensed Material and
248
+ retain an indication of any previous modifications; and
249
+
250
+ c. indicate the Licensed Material is licensed under this
251
+ Public License, and include the text of, or the URI or
252
+ hyperlink to, this Public License.
253
+
254
+ For the avoidance of doubt, You do not have permission under
255
+ this Public License to Share Adapted Material.
256
+
257
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
258
+ reasonable manner based on the medium, means, and context in
259
+ which You Share the Licensed Material. For example, it may be
260
+ reasonable to satisfy the conditions by providing a URI or
261
+ hyperlink to a resource that includes the required
262
+ information.
263
+
264
+ 3. If requested by the Licensor, You must remove any of the
265
+ information required by Section 3(a)(1)(A) to the extent
266
+ reasonably practicable.
267
+
268
+
269
+ Section 4 -- Sui Generis Database Rights.
270
+
271
+ Where the Licensed Rights include Sui Generis Database Rights that
272
+ apply to Your use of the Licensed Material:
273
+
274
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
275
+ to extract, reuse, reproduce, and Share all or a substantial
276
+ portion of the contents of the database for NonCommercial purposes
277
+ only and provided You do not Share Adapted Material;
278
+
279
+ b. if You include all or a substantial portion of the database
280
+ contents in a database in which You have Sui Generis Database
281
+ Rights, then the database in which You have Sui Generis Database
282
+ Rights (but not its individual contents) is Adapted Material; and
283
+
284
+ c. You must comply with the conditions in Section 3(a) if You Share
285
+ all or a substantial portion of the contents of the database.
286
+
287
+ For the avoidance of doubt, this Section 4 supplements and does not
288
+ replace Your obligations under this Public License where the Licensed
289
+ Rights include other Copyright and Similar Rights.
290
+
291
+
292
+ Section 5 -- Disclaimer of Warranties and Limitation of Liability.
293
+
294
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
295
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
296
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
297
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
298
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
299
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
300
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
301
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
302
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
303
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
304
+
305
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
306
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
307
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
308
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
309
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
310
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
311
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
312
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
313
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
314
+
315
+ c. The disclaimer of warranties and limitation of liability provided
316
+ above shall be interpreted in a manner that, to the extent
317
+ possible, most closely approximates an absolute disclaimer and
318
+ waiver of all liability.
319
+
320
+
321
+ Section 6 -- Term and Termination.
322
+
323
+ a. This Public License applies for the term of the Copyright and
324
+ Similar Rights licensed here. However, if You fail to comply with
325
+ this Public License, then Your rights under this Public License
326
+ terminate automatically.
327
+
328
+ b. Where Your right to use the Licensed Material has terminated under
329
+ Section 6(a), it reinstates:
330
+
331
+ 1. automatically as of the date the violation is cured, provided
332
+ it is cured within 30 days of Your discovery of the
333
+ violation; or
334
+
335
+ 2. upon express reinstatement by the Licensor.
336
+
337
+ For the avoidance of doubt, this Section 6(b) does not affect any
338
+ right the Licensor may have to seek remedies for Your violations
339
+ of this Public License.
340
+
341
+ c. For the avoidance of doubt, the Licensor may also offer the
342
+ Licensed Material under separate terms or conditions or stop
343
+ distributing the Licensed Material at any time; however, doing so
344
+ will not terminate this Public License.
345
+
346
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
347
+ License.
348
+
349
+
350
+ Section 7 -- Other Terms and Conditions.
351
+
352
+ a. The Licensor shall not be bound by any additional or different
353
+ terms or conditions communicated by You unless expressly agreed.
354
+
355
+ b. Any arrangements, understandings, or agreements regarding the
356
+ Licensed Material not stated herein are separate from and
357
+ independent of the terms and conditions of this Public License.
358
+
359
+
360
+ Section 8 -- Interpretation.
361
+
362
+ a. For the avoidance of doubt, this Public License does not, and
363
+ shall not be interpreted to, reduce, limit, restrict, or impose
364
+ conditions on any use of the Licensed Material that could lawfully
365
+ be made without permission under this Public License.
366
+
367
+ b. To the extent possible, if any provision of this Public License is
368
+ deemed unenforceable, it shall be automatically reformed to the
369
+ minimum extent necessary to make it enforceable. If the provision
370
+ cannot be reformed, it shall be severed from this Public License
371
+ without affecting the enforceability of the remaining terms and
372
+ conditions.
373
+
374
+ c. No term or condition of this Public License will be waived and no
375
+ failure to comply consented to unless expressly agreed to by the
376
+ Licensor.
377
+
378
+ d. Nothing in this Public License constitutes or may be interpreted
379
+ as a limitation upon, or waiver of, any privileges and immunities
380
+ that apply to the Licensor or You, including from the legal
381
+ processes of any jurisdiction or authority.
382
+
383
+ =======================================================================
384
+
385
+ Creative Commons is not a party to its public
386
+ licenses. Notwithstanding, Creative Commons may elect to apply one of
387
+ its public licenses to material it publishes and in those instances
388
+ will be considered the “Licensor.” The text of the Creative Commons
389
+ public licenses is dedicated to the public domain under the CC0 Public
390
+ Domain Dedication. Except for the limited purpose of indicating that
391
+ material is shared under a Creative Commons public license or as
392
+ otherwise permitted by the Creative Commons policies published at
393
+ creativecommons.org/policies, Creative Commons does not authorize the
394
+ use of the trademark "Creative Commons" or any other trademark or logo
395
+ of Creative Commons without its prior written consent including,
396
+ without limitation, in connection with any unauthorized modifications
397
+ to any of its public licenses or any other arrangements,
398
+ understandings, or agreements concerning use of licensed material. For
399
+ the avoidance of doubt, this paragraph does not form part of the
400
+ public licenses.
401
+
402
+ Creative Commons may be contacted at creativecommons.org.
multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://zenodo.org/record/8170005
phos-ST_Example_Code.ipynb ADDED
@@ -0,0 +1,312 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "b1b4e507-0ca2-4309-ac5c-0461f99edc72",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Phosformer-ST Example Code"
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "markdown",
13
+ "id": "3a23dd26-2060-4cb1-a1a0-dd97b168a329",
14
+ "metadata": {},
15
+ "source": [
16
+ "## imports"
17
+ ]
18
+ },
19
+ {
20
+ "cell_type": "code",
21
+ "execution_count": null,
22
+ "id": "ec3bd89c-8aa1-408c-b569-c89dc2bb768d",
23
+ "metadata": {},
24
+ "outputs": [],
25
+ "source": [
26
+ "import os\n",
27
+ "import sys\n",
28
+ "import hashlib\n",
29
+ "import warnings\n",
30
+ "sys.dont_write_bytecode=True\n",
31
+ "\n",
32
+ "import numpy as np\n",
33
+ "import pandas as pd\n",
34
+ "import matplotlib.pyplot as plt\n",
35
+ "\n",
36
+ "import torch\n",
37
+ "\n",
38
+ "from tokenization_esm import EsmTokenizer\n",
39
+ "from modeling_esm import EsmForSequenceClassificationMHACustom\n",
40
+ "#for versioning spesfics see ReadMe \n"
41
+ ]
42
+ },
43
+ {
44
+ "cell_type": "markdown",
45
+ "id": "a42e87fc-bd23-4b7b-8234-06cbdcb25bc0",
46
+ "metadata": {},
47
+ "source": [
48
+ "## loading in pre-trained model"
49
+ ]
50
+ },
51
+ {
52
+ "cell_type": "code",
53
+ "execution_count": 2,
54
+ "id": "9def3d4b-822d-44b8-a896-3e6ee5aca13d",
55
+ "metadata": {},
56
+ "outputs": [],
57
+ "source": [
58
+ "model_dir = 'multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90'\n",
59
+ "\n",
60
+ "tokenizer = EsmTokenizer.from_pretrained(model_dir)\n",
61
+ "model = EsmForSequenceClassificationMHACustom.from_pretrained(model_dir, num_labels=2)\n",
62
+ "\n"
63
+ ]
64
+ },
65
+ {
66
+ "cell_type": "markdown",
67
+ "id": "ff3f7f18-6cb2-4818-9583-bb729e848b81",
68
+ "metadata": {},
69
+ "source": [
70
+ "## configureing paramaters of the Phos-ST model\n",
71
+ "\n",
72
+ "## also orginizing the data for the input into Phos-ST "
73
+ ]
74
+ },
75
+ {
76
+ "cell_type": "code",
77
+ "execution_count": 3,
78
+ "id": "1dbece0a-39a3-4932-8781-a679dd699587",
79
+ "metadata": {},
80
+ "outputs": [],
81
+ "source": [
82
+ "def run_model(peptides, kinases, model=model, tokenizer=tokenizer, device='cuda', batch_size=50, output_hidden_states=True, output_attentions=True):\n",
83
+ " torch.cuda.empty_cache()\n",
84
+ " \n",
85
+ " model.eval()\n",
86
+ " model = model.to(device)\n",
87
+ " \n",
88
+ " size = len(peptides)\n",
89
+ " breaks = set(np.cumsum([batch_size]*(size//batch_size)+[size%batch_size])-1)\n",
90
+ "\n",
91
+ " pairs = []\n",
92
+ " for n, pair in enumerate(zip(peptides, kinases)):\n",
93
+ " sys.stderr.write(f'{1+n}\\r')\n",
94
+ " pairs += [pair]\n",
95
+ " if n in breaks:\n",
96
+ " \n",
97
+ " output = dict(zip(('peptide','kinase'),zip(*pairs)))\n",
98
+ " ids = tokenizer(pairs, padding=True, return_tensors='pt')\n",
99
+ " ids = ids.to(device)\n",
100
+ " \n",
101
+ " with torch.no_grad():\n",
102
+ " results, classifier_attn_outputs, classifier_attn_output_weights = model(ids['input_ids'], \n",
103
+ " attention_mask=ids['attention_mask'], \n",
104
+ " output_hidden_states=output_hidden_states, \n",
105
+ " output_attentions=output_attentions)\n",
106
+ " \n",
107
+ " attention_mask = ids['attention_mask'].cpu().type(torch.bool)\n",
108
+ "\n",
109
+ " output['probability'] = results['logits'].softmax(1)[:,1].cpu().numpy()\n",
110
+ " \n",
111
+ " if output_hidden_states:\n",
112
+ " last_embeddings = results['hidden_states'][-1].cpu().numpy()\n",
113
+ " output['embedding'] = [i[m] for i, m in zip(last_embeddings, attention_mask)]\n",
114
+ " \n",
115
+ " if output_attentions:\n",
116
+ " last_attentions = results['attentions'][-1].cpu().numpy()\n",
117
+ " output['attention'] = [i[:,m,:][:,:,m] for i, m in zip(last_attentions, attention_mask)]\n",
118
+ " \n",
119
+ " classifier_attn_outputs = classifier_attn_outputs.cpu()\n",
120
+ " output['classifier_attn_outputs'] = classifier_attn_outputs\n",
121
+ "\n",
122
+ " classifier_attn_output_weights = classifier_attn_output_weights.cpu()\n",
123
+ " output['classifier_attn_output_weights'] = [i[:,m[16:]] for i, m in zip(classifier_attn_output_weights, attention_mask)]\n",
124
+ " \n",
125
+ " keys = output.keys()\n",
126
+ " for data in zip(*(output[k] for k in keys)):\n",
127
+ " yield dict(zip(keys, data))\n",
128
+ " \n",
129
+ " pairs = []\n"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "markdown",
134
+ "id": "3fbfc05d-970c-4db9-bae8-61ea2ffb06af",
135
+ "metadata": {},
136
+ "source": [
137
+ "## helper funtion to use Phos-ST"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "code",
142
+ "execution_count": 4,
143
+ "id": "98e90bc6-28db-449c-8a0b-805ef22cd9ec",
144
+ "metadata": {},
145
+ "outputs": [],
146
+ "source": [
147
+ "# this could be modified to take in a list of substrate and kinase domains\n",
148
+ "# just drop the square brackets on the kinaseDomainSeq variable and substrate15mer variable around the job fuction's 1st and 2nd argument\n",
149
+ "def phosST(kinaseDomainSeq,substrate15mer):\n",
150
+ " job = run_model(\n",
151
+ " [substrate15mer],\n",
152
+ " [kinaseDomainSeq],\n",
153
+ " model=model, \n",
154
+ " tokenizer=tokenizer, \n",
155
+ " device='cuda', \n",
156
+ " batch_size=10,\n",
157
+ " output_hidden_states=False,\n",
158
+ " output_attentions=False,\n",
159
+ " )\n",
160
+ " \n",
161
+ " #total = dataset.shape[0]\n",
162
+ " results = {\n",
163
+ " 'kinase' : [],\n",
164
+ " 'peptide' : [],\n",
165
+ " 'prob' : [],\n",
166
+ " }\n",
167
+ "\n",
168
+ " \n",
169
+ " for n, i in enumerate(job):\n",
170
+ " #sys.stderr.write(f'{n+1} / {total}\\r')\n",
171
+ " results['kinase' ] += [i['kinase']]\n",
172
+ " results['peptide'] += [i['peptide']]\n",
173
+ " results['prob' ] += [i['probability']]\n",
174
+ " \n",
175
+ " result = pd.DataFrame(results)\n",
176
+ " print(\"The Predictive score is \"+str(i['probability']))\n",
177
+ " \n",
178
+ " return result\n",
179
+ " "
180
+ ]
181
+ },
182
+ {
183
+ "cell_type": "code",
184
+ "execution_count": null,
185
+ "id": "151c217b-1ee1-4cf7-b41f-b52f5ce22719",
186
+ "metadata": {
187
+ "scrolled": true
188
+ },
189
+ "outputs": [],
190
+ "source": [
191
+ "\n"
192
+ ]
193
+ },
194
+ {
195
+ "cell_type": "markdown",
196
+ "id": "ee511f8b-5d9e-4c8a-8191-bd2a7fd3a5e9",
197
+ "metadata": {},
198
+ "source": [
199
+ "# Postive Example"
200
+ ]
201
+ },
202
+ {
203
+ "cell_type": "code",
204
+ "execution_count": null,
205
+ "id": "5bd6e8ee-444e-49d5-a617-d2343759759a",
206
+ "metadata": {},
207
+ "outputs": [],
208
+ "source": [
209
+ "# P17612 KAPCA_HUMAN\n",
210
+ "kinDomain=\"FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF\"\n",
211
+ "# P53602_S96_LARKRRNSRDGDPLP\n",
212
+ "substrate=\"LARKRRNSRDGDPLP\"\n",
213
+ "\n",
214
+ "phosST(kinDomain,substrate).to_csv('PostiveExample.csv')\n",
215
+ "#the score should be listed in the csv file aswell"
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": null,
221
+ "id": "f1f27f0f-5bda-4107-adef-a8712ace540c",
222
+ "metadata": {},
223
+ "outputs": [],
224
+ "source": []
225
+ },
226
+ {
227
+ "cell_type": "markdown",
228
+ "id": "f432840c-f56e-40f2-959f-157dc65f57d6",
229
+ "metadata": {},
230
+ "source": [
231
+ "# Negitive Example"
232
+ ]
233
+ },
234
+ {
235
+ "cell_type": "code",
236
+ "execution_count": null,
237
+ "id": "41e2c0de-9088-4cf1-a744-a451ce19d7a6",
238
+ "metadata": {},
239
+ "outputs": [],
240
+ "source": [
241
+ "# P17612 KAPCA_HUMAN\n",
242
+ "kinDomain=\"FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF\"\n",
243
+ "# 'Q01831_T169_PVEIEIETPEQAKTR'\n",
244
+ "substrate=\"PVEIEIETPEQAKTR\"\n",
245
+ "\n",
246
+ "phosST(kinDomain,substrate).to_csv('NegitiveExample.csv')\n",
247
+ "#the score should be listed in the csv file aswell"
248
+ ]
249
+ },
250
+ {
251
+ "cell_type": "code",
252
+ "execution_count": null,
253
+ "id": "ac3b5b10-3cde-4f66-ba7a-f137538fa880",
254
+ "metadata": {},
255
+ "outputs": [],
256
+ "source": []
257
+ },
258
+ {
259
+ "cell_type": "code",
260
+ "execution_count": null,
261
+ "id": "85509eea-3217-492f-bf77-9da8ee123b76",
262
+ "metadata": {},
263
+ "outputs": [],
264
+ "source": []
265
+ },
266
+ {
267
+ "cell_type": "code",
268
+ "execution_count": null,
269
+ "id": "9bfe47df-7e6b-487b-92b6-33ba2d9c6eb7",
270
+ "metadata": {},
271
+ "outputs": [],
272
+ "source": []
273
+ },
274
+ {
275
+ "cell_type": "code",
276
+ "execution_count": null,
277
+ "id": "c6c2b239-f7d1-418b-bd1d-916fb1db8933",
278
+ "metadata": {},
279
+ "outputs": [],
280
+ "source": []
281
+ },
282
+ {
283
+ "cell_type": "code",
284
+ "execution_count": null,
285
+ "id": "5af09564-6b23-4dea-a0a3-76bc8362b7b4",
286
+ "metadata": {},
287
+ "outputs": [],
288
+ "source": []
289
+ }
290
+ ],
291
+ "metadata": {
292
+ "kernelspec": {
293
+ "display_name": "Python 3",
294
+ "language": "python",
295
+ "name": "python3"
296
+ },
297
+ "language_info": {
298
+ "codemirror_mode": {
299
+ "name": "ipython",
300
+ "version": 3
301
+ },
302
+ "file_extension": ".py",
303
+ "mimetype": "text/x-python",
304
+ "name": "python",
305
+ "nbconvert_exporter": "python",
306
+ "pygments_lexer": "ipython3",
307
+ "version": "3.9.16"
308
+ }
309
+ },
310
+ "nbformat": 4,
311
+ "nbformat_minor": 5
312
+ }
phosST.yml ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: phosST
2
+ channels:
3
+ - conda-forge
4
+ - defaults
5
+ dependencies:
6
+ - _libgcc_mutex=0.1=main
7
+ - _openmp_mutex=5.1=1_gnu
8
+ - anyio=3.6.2=pyhd8ed1ab_0
9
+ - argon2-cffi=21.3.0=pyhd8ed1ab_0
10
+ - argon2-cffi-bindings=21.2.0=py39hb9d737c_2
11
+ - asttokens=2.2.1=pyhd8ed1ab_0
12
+ - async-lru=2.0.2=pyhd8ed1ab_0
13
+ - attrs=23.1.0=pyh71513ae_1
14
+ - babel=2.12.1=pyhd8ed1ab_1
15
+ - backcall=0.2.0=pyh9f0ad1d_0
16
+ - backports=1.0=pyhd8ed1ab_3
17
+ - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
18
+ - beautifulsoup4=4.12.2=pyha770c72_0
19
+ - bleach=6.0.0=pyhd8ed1ab_0
20
+ - brotli=1.0.9=h166bdaf_7
21
+ - brotli-bin=1.0.9=h166bdaf_7
22
+ - ca-certificates=2023.5.7=hbcca054_0
23
+ - certifi=2023.5.7=pyhd8ed1ab_0
24
+ - cffi=1.15.0=py39h4bc2ebd_0
25
+ - charset-normalizer=3.1.0=pyhd8ed1ab_0
26
+ - decorator=5.1.1=pyhd8ed1ab_0
27
+ - defusedxml=0.7.1=pyhd8ed1ab_0
28
+ - entrypoints=0.4=pyhd8ed1ab_0
29
+ - executing=1.2.0=pyhd8ed1ab_0
30
+ - flit-core=3.9.0=pyhd8ed1ab_0
31
+ - idna=3.4=pyhd8ed1ab_0
32
+ - importlib-metadata=6.6.0=pyha770c72_0
33
+ - importlib_metadata=6.6.0=hd8ed1ab_0
34
+ - importlib_resources=5.12.0=pyhd8ed1ab_0
35
+ - ipykernel=5.5.5=py39hef51801_0
36
+ - ipython=8.13.2=pyh41d4057_0
37
+ - ipython_genutils=0.2.0=py_1
38
+ - jedi=0.18.2=pyhd8ed1ab_0
39
+ - jinja2=3.1.2=pyhd8ed1ab_1
40
+ - json5=0.9.5=pyh9f0ad1d_0
41
+ - jsonschema=4.17.3=pyhd8ed1ab_0
42
+ - jupyter-lsp=2.1.0=pyhd8ed1ab_0
43
+ - jupyter_client=8.2.0=pyhd8ed1ab_0
44
+ - jupyter_core=5.3.0=py39hf3d152e_0
45
+ - jupyter_events=0.6.3=pyhd8ed1ab_0
46
+ - jupyter_server=2.6.0=pyhd8ed1ab_0
47
+ - jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
48
+ - jupyterlab=4.0.0=pyhd8ed1ab_1
49
+ - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
50
+ - jupyterlab_server=2.22.1=pyhd8ed1ab_0
51
+ - ld_impl_linux-64=2.38=h1181459_1
52
+ - libbrotlicommon=1.0.9=h166bdaf_7
53
+ - libbrotlidec=1.0.9=h166bdaf_7
54
+ - libbrotlienc=1.0.9=h166bdaf_7
55
+ - libffi=3.4.4=h6a678d5_0
56
+ - libgcc-ng=11.2.0=h1234567_1
57
+ - libgomp=11.2.0=h1234567_1
58
+ - libsodium=1.0.18=h36c2ea0_1
59
+ - libstdcxx-ng=11.2.0=h1234567_1
60
+ - markupsafe=2.1.1=py39h7f8727e_0
61
+ - matplotlib-inline=0.1.6=pyhd8ed1ab_0
62
+ - mistune=2.0.5=pyhd8ed1ab_0
63
+ - nbclient=0.8.0=pyhd8ed1ab_0
64
+ - nbconvert-core=7.4.0=pyhd8ed1ab_0
65
+ - nbformat=5.8.0=pyhd8ed1ab_0
66
+ - ncurses=6.4=h6a678d5_0
67
+ - notebook-shim=0.2.3=pyhd8ed1ab_0
68
+ - openssl=1.1.1t=h7f8727e_0
69
+ - overrides=7.3.1=pyhd8ed1ab_0
70
+ - packaging=23.1=pyhd8ed1ab_0
71
+ - pandocfilters=1.5.0=pyhd8ed1ab_0
72
+ - parso=0.8.3=pyhd8ed1ab_0
73
+ - pexpect=4.8.0=pyh1a96a4e_2
74
+ - pickleshare=0.7.5=py_1003
75
+ - pip=23.0.1=py39h06a4308_0
76
+ - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0
77
+ - platformdirs=3.5.1=pyhd8ed1ab_0
78
+ - prometheus_client=0.17.0=pyhd8ed1ab_0
79
+ - prompt-toolkit=3.0.38=pyha770c72_0
80
+ - prompt_toolkit=3.0.38=hd8ed1ab_0
81
+ - ptyprocess=0.7.0=pyhd3deb0d_0
82
+ - pure_eval=0.2.2=pyhd8ed1ab_0
83
+ - pycparser=2.21=pyhd8ed1ab_0
84
+ - pygments=2.15.1=pyhd8ed1ab_0
85
+ - pyrsistent=0.18.0=py39heee7806_0
86
+ - pysocks=1.7.1=pyha2e5f31_6
87
+ - python=3.9.16=h7a1cb2a_2
88
+ - python-dateutil=2.8.2=pyhd8ed1ab_0
89
+ - python-fastjsonschema=2.17.1=pyhd8ed1ab_0
90
+ - python-json-logger=2.0.7=pyhd8ed1ab_0
91
+ - python_abi=3.9=2_cp39
92
+ - pytz=2023.3=pyhd8ed1ab_0
93
+ - pyyaml=6.0=py39hb9d737c_4
94
+ - pyzmq=25.0.2=py39h6a678d5_0
95
+ - readline=8.2=h5eee18b_0
96
+ - requests=2.31.0=pyhd8ed1ab_0
97
+ - rfc3339-validator=0.1.4=pyhd8ed1ab_0
98
+ - rfc3986-validator=0.1.1=pyh9f0ad1d_0
99
+ - send2trash=1.8.2=pyh41d4057_0
100
+ - setuptools=66.0.0=py39h06a4308_0
101
+ - six=1.16.0=pyh6c4a22f_0
102
+ - sniffio=1.3.0=pyhd8ed1ab_0
103
+ - soupsieve=2.3.2.post1=pyhd8ed1ab_0
104
+ - sqlite=3.41.2=h5eee18b_0
105
+ - stack_data=0.6.2=pyhd8ed1ab_0
106
+ - terminado=0.17.1=pyh41d4057_0
107
+ - tinycss2=1.2.1=pyhd8ed1ab_0
108
+ - tk=8.6.12=h1ccaba5_0
109
+ - tomli=2.0.1=pyhd8ed1ab_0
110
+ - tornado=6.2=py39h5eee18b_0
111
+ - traitlets=5.9.0=pyhd8ed1ab_0
112
+ - typing-extensions=4.6.2=hd8ed1ab_0
113
+ - typing_extensions=4.6.2=pyha770c72_0
114
+ - typing_utils=0.1.0=pyhd8ed1ab_0
115
+ - urllib3=2.0.2=pyhd8ed1ab_0
116
+ - wcwidth=0.2.6=pyhd8ed1ab_0
117
+ - webencodings=0.5.1=py_1
118
+ - websocket-client=1.5.2=pyhd8ed1ab_0
119
+ - wheel=0.38.4=py39h06a4308_0
120
+ - xz=5.4.2=h5eee18b_0
121
+ - yaml=0.2.5=h7f98852_2
122
+ - zeromq=4.3.4=h9c3ff4c_1
123
+ - zipp=3.15.0=pyhd8ed1ab_0
124
+ - zlib=1.2.13=h5eee18b_0
125
+ - pip:
126
+ - aiohttp==3.8.4
127
+ - aiosignal==1.3.1
128
+ - async-timeout==4.0.2
129
+ - blosc2==2.0.0
130
+ - cmake==3.26.3
131
+ - contourpy==1.0.7
132
+ - cycler==0.11.0
133
+ - cython==0.29.35
134
+ - datasets==2.12.0
135
+ - dill==0.3.6
136
+ - filelock==3.12.0
137
+ - fonttools==4.39.4
138
+ - frozenlist==1.3.3
139
+ - fsspec==2023.5.0
140
+ - huggingface-hub==0.14.1
141
+ - joblib==1.2.0
142
+ - kiwisolver==1.4.4
143
+ - lit==16.0.5
144
+ - llvmlite==0.40.1
145
+ - logomaker==0.8
146
+ - matplotlib==3.7.1
147
+ - mpmath==1.3.0
148
+ - msgpack==1.0.5
149
+ - multidict==6.0.4
150
+ - multiprocess==0.70.14
151
+ - networkx==3.1
152
+ - numba==0.57.1
153
+ - numexpr==2.8.4
154
+ - numpy==1.24.3
155
+ - nvidia-cublas-cu11==11.10.3.66
156
+ - nvidia-cuda-cupti-cu11==11.7.101
157
+ - nvidia-cuda-nvrtc-cu11==11.7.99
158
+ - nvidia-cuda-runtime-cu11==11.7.99
159
+ - nvidia-cudnn-cu11==8.5.0.96
160
+ - nvidia-cufft-cu11==10.9.0.58
161
+ - nvidia-curand-cu11==10.2.10.91
162
+ - nvidia-cusolver-cu11==11.4.0.1
163
+ - nvidia-cusparse-cu11==11.7.4.91
164
+ - nvidia-nccl-cu11==2.14.3
165
+ - nvidia-nvtx-cu11==11.7.91
166
+ - pandas==2.0.2
167
+ - pillow==9.5.0
168
+ - py-cpuinfo==9.0.0
169
+ - pyarrow==12.0.0
170
+ - pynndescent==0.5.10
171
+ - pyparsing==3.0.9
172
+ - regex==2023.5.5
173
+ - responses==0.18.0
174
+ - scikit-learn==1.2.2
175
+ - scipy==1.10.1
176
+ - sympy==1.12
177
+ - tables==3.8.0
178
+ - threadpoolctl==3.1.0
179
+ - tokenizers==0.13.3
180
+ - torch==2.0.1
181
+ - tqdm==4.65.0
182
+ - transformers==4.29.2
183
+ - triton==2.0.0
184
+ - tzdata==2023.3
185
+ - umap-learn==0.5.3
186
+ - xxhash==3.2.0
187
+ - yarl==1.9.2
188
+ prefix: /home/esbg/anaconda3/envs/phosST