File size: 3,533 Bytes
6d49cbd
 
 
 
 
 
 
 
e5c5469
 
 
80dae3e
 
 
d3a1a13
 
6d49cbd
 
 
 
 
fe7741c
e5c5469
6d49cbd
 
 
e5c5469
6d49cbd
e5c5469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3428e70
6d49cbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
538912a
 
 
 
 
 
 
 
 
 
 
 
 
 
c3ec6aa
538912a
 
 
 
 
80a955e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: apache-2.0
base_model: codeparrot/codeparrot-small
tags:
- generated_from_trainer
model-index:
- name: solidity-generator
  results: []
datasets:
- mwritescode/slither-audited-smart-contracts
pipeline_tag: text-generation
language:
- en
library_name: transformers
widget:
 - text: "contract MyToken is ERC20{"
---


# solidity-generator

This model is a model specialized in generating Solidity contract codes. Derived from the [codeparrot/codeparrot-small](https://huggingface.co/codeparrot/codeparrot-small) model, it's been meticulously trained on an extensive set of Solidity contracts and patterns, making it apt for assisting in drafting or suggesting contract structures.


## Model description

This model has been designed specifically for generating Solidity contracts. Being a derivative of the `codeparrot-small` model, it retains the broader capabilities of the parent model while demonstrating a keen proficiency in understanding and generating Solidity-centric texts.

### Performance

The model reported a loss of `0.2180` on the evaluation set.

## Intended Uses & Limitations


### Intended Uses:
1. Assist developers by auto-generating contract code snippets based on prompts.
2. Help in understanding and drafting complex contract structures.

### Limitations:
1. The generated code must be reviewed for security and functional correctness.
2. The clarity of the generated code largely depends on the specificity of the prompt.

## Training Details

### Dataset
The model was fine-tuned on [mwritescode/slither-audited-smart-contracts](https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts) dataset comprised of a range of Solidity contracts.


## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 7e-05
- train_batch_size: 5
- eval_batch_size: 5
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 144
- num_epochs: 8

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 0.302         | 0.35  | 2000  | 0.3237          |
| 0.298         | 0.69  | 4000  | 0.2871          |
| 0.232         | 1.04  | 6000  | 0.2645          |
| 0.2415        | 1.38  | 8000  | 0.2522          |
| 0.2261        | 1.73  | 10000 | 0.2431          |
| 0.1924        | 2.07  | 12000 | 0.2332          |
| 0.1913        | 2.42  | 14000 | 0.2282          |
| 0.2152        | 2.76  | 16000 | 0.2215          |
| 0.1508        | 3.11  | 18000 | 0.2180          |


### Framework versions

- Transformers 4.31.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.3
- Tokenizers 0.13.3


## How to Use
If you wish to use this model to generate Solidity contract code, follow the steps below:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ckandemir/solidity_generator")
model = AutoModelForCausalLM.from_pretrained("ckandemir/solidity_generator")

# Input your code prompt
input_text = "contract MyToken is ERC20{"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
sample_output = model.generate(input_ids, do_sample=True, max_length=400, num_return_sequences=1, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
print(generated_text)
```