ashrafulparan commited on
Commit
d4b9758
1 Parent(s): b60c5a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -18
README.md CHANGED
@@ -17,29 +17,28 @@ tags:
17
 
18
  ## Model Details
19
 
20
- ### Model Description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  <!-- Provide a longer summary of what this model is. -->
23
 
24
 
25
 
26
- - **Developed by:** [More Information Needed]
27
- - **Funded by [optional]:** [More Information Needed]
28
- - **Shared by [optional]:** [More Information Needed]
29
- - **Model type:** [More Information Needed]
30
- - **Language(s) (NLP):** [More Information Needed]
31
- - **License:** [More Information Needed]
32
- - **Finetuned from model [optional]:** [More Information Needed]
33
-
34
- ### Model Sources [optional]
35
-
36
- <!-- Provide the basic links for the model. -->
37
-
38
- - **Repository:** [More Information Needed]
39
- - **Paper [optional]:** [More Information Needed]
40
- - **Demo [optional]:** [More Information Needed]
41
-
42
-
43
  ## Citation [optional]
44
 
45
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
17
 
18
  ## Model Details
19
 
20
+ ### Model Hyperparameters
21
+
22
+ args = TrainingArguments(
23
+ per_device_train_batch_size = 2,
24
+ gradient_accumulation_steps = 4,
25
+ warmup_steps = 5,
26
+ num_train_epochs = 12,
27
+ learning_rate = 5e-5,
28
+ fp16 = not torch.cuda.is_bf16_supported(),
29
+ bf16 = torch.cuda.is_bf16_supported(),
30
+ logging_steps = 10,
31
+ optim = "adamw_8bit",
32
+ weight_decay = 0.001,
33
+ lr_scheduler_type = "linear",
34
+ seed = 3407,
35
+ output_dir = "outputs",
36
+ report_to = "none",)
37
 
38
  <!-- Provide a longer summary of what this model is. -->
39
 
40
 
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Citation [optional]
43
 
44
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->