allen0909 commited on
Commit
cbaf8df
1 Parent(s): 3cda15e

End of training

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -7,14 +7,14 @@ tags:
7
  - generated_from_trainer
8
  base_model: MediaTek-Research/Breeze-7B-Instruct-v1_0
9
  model-index:
10
- - name: ROE_QA_Breeze-7B-Instruct-v1_0_Q30_80_20
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # ROE_QA_Breeze-7B-Instruct-v1_0_Q30_80_20
18
 
19
  This model is a fine-tuned version of [MediaTek-Research/Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) on the None dataset.
20
 
@@ -35,15 +35,15 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 0.0002
39
  - train_batch_size: 2
40
  - eval_batch_size: 8
41
  - seed: 42
42
- - gradient_accumulation_steps: 4
43
- - total_train_batch_size: 8
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
- - lr_scheduler_type: constant
46
- - lr_scheduler_warmup_ratio: 0.03
47
  - num_epochs: 3
48
  - mixed_precision_training: Native AMP
49
 
 
7
  - generated_from_trainer
8
  base_model: MediaTek-Research/Breeze-7B-Instruct-v1_0
9
  model-index:
10
+ - name: ROE_QA_Breeze-7B-Instruct-v1_0_Q30_80_20_V2
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ # ROE_QA_Breeze-7B-Instruct-v1_0_Q30_80_20_V2
18
 
19
  This model is a fine-tuned version of [MediaTek-Research/Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) on the None dataset.
20
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 0.0001
39
  - train_batch_size: 2
40
  - eval_batch_size: 8
41
  - seed: 42
42
+ - gradient_accumulation_steps: 2
43
+ - total_train_batch_size: 4
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.05
47
  - num_epochs: 3
48
  - mixed_precision_training: Native AMP
49