Update README.md
Browse filesUpdate model description
README.md
CHANGED
@@ -3,13 +3,15 @@ tags:
|
|
3 |
- autotrain
|
4 |
- summarization
|
5 |
language:
|
6 |
-
-
|
7 |
widget:
|
8 |
-
- text:
|
9 |
datasets:
|
10 |
- sagard21/autotrain-data-code-explainer
|
11 |
co2_eq_emissions:
|
12 |
emissions: 5.393079045128973
|
|
|
|
|
13 |
---
|
14 |
|
15 |
# Model Trained Using AutoTrain
|
@@ -18,6 +20,47 @@ co2_eq_emissions:
|
|
18 |
- Model ID: 2745581349
|
19 |
- CO2 Emissions (in grams): 5.3931
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
## Validation Metrics
|
22 |
|
23 |
- Loss: 2.156
|
@@ -26,11 +69,3 @@ co2_eq_emissions:
|
|
26 |
- RougeL: 25.445
|
27 |
- RougeLsum: 28.084
|
28 |
- Gen Len: 19.000
|
29 |
-
|
30 |
-
## Usage
|
31 |
-
|
32 |
-
You can use cURL to access this model:
|
33 |
-
|
34 |
-
```
|
35 |
-
$ curl -X POST -H "Authorization: Bearer YOUR_HUGGINGFACE_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.co/sagard21/autotrain-code-explainer-2745581349
|
36 |
-
```
|
|
|
3 |
- autotrain
|
4 |
- summarization
|
5 |
language:
|
6 |
+
- en
|
7 |
widget:
|
8 |
+
- text: I love AutoTrain 🤗
|
9 |
datasets:
|
10 |
- sagard21/autotrain-data-code-explainer
|
11 |
co2_eq_emissions:
|
12 |
emissions: 5.393079045128973
|
13 |
+
license: mit
|
14 |
+
pipeline_tag: summarization
|
15 |
---
|
16 |
|
17 |
# Model Trained Using AutoTrain
|
|
|
20 |
- Model ID: 2745581349
|
21 |
- CO2 Emissions (in grams): 5.3931
|
22 |
|
23 |
+
# Model Description
|
24 |
+
|
25 |
+
This model is an attempt to simplify code understanding by generating line by line explanation of a source code. This model was fine-tuned using the Salesforce/codet5-large model. Currently it is trained on a small subset of Python snippets.
|
26 |
+
|
27 |
+
# Model Usage
|
28 |
+
|
29 |
+
```py
|
30 |
+
from transformers import AutoTokenizer, T5ForConditionalGeneration, SummarizationPipeline
|
31 |
+
import torch
|
32 |
+
|
33 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
34 |
+
|
35 |
+
pipeline = SummarizationPipeline(
|
36 |
+
model=T5ForConditionalGeneration.from_pretrained("sagard21/python-code-explainer"),
|
37 |
+
tokenizer=AutoTokenizer.from_pretrained("sagard21/python-code-explainer", skip_special_tokens=True),
|
38 |
+
device=device
|
39 |
+
)
|
40 |
+
|
41 |
+
raw_code = """
|
42 |
+
def preprocess(text: str) -> str:
|
43 |
+
text = str(text)
|
44 |
+
text = text.replace("\n", " ")
|
45 |
+
tokenized_text = text.split(" ")
|
46 |
+
preprocessed_text = " ".join([token for token in tokenized_text if token])
|
47 |
+
|
48 |
+
return preprocessed_text
|
49 |
+
"""
|
50 |
+
pipeline([raw_code])
|
51 |
+
|
52 |
+
```
|
53 |
+
|
54 |
+
### Expected JSON Output
|
55 |
+
|
56 |
+
```
|
57 |
+
[
|
58 |
+
{
|
59 |
+
"summary_text": "Create a function preprocess that will take the text as an argument and return the preprocessed text.\n1. In this case, the text will be converted to a string.\n2. At first, we will replace all \"\\n\" with \" \" and then split the text by \" \".\n3. Then we will call the tokenize function on the text and tokenize the text using the split() method.\n4. Next step is to create a list of all the tokens in the string and join them together.\n5. Then the function will return the string preprocessed_text.\n"
|
60 |
+
}
|
61 |
+
]
|
62 |
+
```
|
63 |
+
|
64 |
## Validation Metrics
|
65 |
|
66 |
- Loss: 2.156
|
|
|
69 |
- RougeL: 25.445
|
70 |
- RougeLsum: 28.084
|
71 |
- Gen Len: 19.000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|