laurentiubp commited on
Commit
05fc736
1 Parent(s): 67a0ba7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -11
README.md CHANGED
@@ -32,6 +32,8 @@ The model shows improved proficiency with the Catalan language.
32
 
33
  The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.
34
 
 
 
35
 
36
  **Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
37
 
@@ -84,7 +86,7 @@ print(outputs[0]["generated_text"][len(prompt):])
84
 
85
  The model was trained **with the same prompt template of Llama-3 Instruct**.
86
 
87
- The model was trained for two epochs on 6x A100 80GB GPUs using DeepSpeed ZeRO State-3 without CPU offloading.
88
 
89
  ### Training hyperparameters
90
 
@@ -99,16 +101,29 @@ The following hyperparameters were used during training:
99
 
100
  ### Training results
101
 
102
- | Training Loss | Epoch | Step | Validation Loss |
103
- |:-------------:|:-----:|:----:|:----------------:|
104
- | 1.0186 | 0.22 | 200 | 1.0209 |
105
- | 0.9588 | 0.43 | 400 | 0.9489 |
106
- | 0.9111 | 0.65 | 600 | 0.9086 |
107
- | 0.8971 | 0.86 | 800 | 0.8886 |
108
- | 0.8002 | 1.22 | 1000 | 0.8989 |
109
- | 0.8068 | 1.43 | 1200 | 0.8835 |
110
- | 0.7722 | 1.65 | 1400 | 0.8654 |
111
- | 0.7805 | 1.86 | 1600 | 0.8528 |
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
 
114
  ## Intended Use
 
32
 
33
  The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.
34
 
35
+ **NOTE:** The model was trained for one epoch, then the `train` split of dataset was shuffled and the model was trained for another epoch
36
+
37
 
38
  **Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
39
 
 
86
 
87
  The model was trained **with the same prompt template of Llama-3 Instruct**.
88
 
89
+ The model was trained for two epochs on **6x A100 80GB GPUs using DeepSpeed ZeRO** State-3 without CPU offloading.
90
 
91
  ### Training hyperparameters
92
 
 
101
 
102
  ### Training results
103
 
104
+ **Epoch 1**
105
+
106
+ | Training Loss | Epoch | Step | Validation Loss |
107
+ |:-------------:|:-----:|:----:|:---------------:|
108
+ | 1.0938 | 0.11 | 100 | 1.0779 |
109
+ | 1.0186 | 0.22 | 200 | 1.0209 |
110
+ | 1.0157 | 0.32 | 300 | 0.9808 |
111
+ | 0.9588 | 0.43 | 400 | 0.9489 |
112
+ | 0.9039 | 0.54 | 500 | 0.9244 |
113
+ | 0.9111 | 0.65 | 600 | 0.9086 |
114
+ | 0.8918 | 0.75 | 700 | 0.8961 |
115
+ | 0.8971 | 0.86 | 800 | 0.8886 |
116
+ | 0.8631 | 0.97 | 900 | 0.8846 |
117
+
118
+
119
+ **Epoch 2**
120
+
121
+ | Training Loss | Epoch | Step | Validation Loss |
122
+ |:-------------:|:-----:|:----:|:---------------:|
123
+ | 0.8002 | 0.22 | 200 | 0.8989 |
124
+ | 0.8068 | 0.43 | 400 | 0.8835 |
125
+ | 0.7722 | 0.65 | 600 | 0.8654 |
126
+ | 0.7805 | 0.86 | 800 | 0.8528 |
127
 
128
 
129
  ## Intended Use