End of training
Browse files
README.md
CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
- eval_enwikippl:
|
19 |
-
- eval_frwikippl:
|
20 |
-
- eval_zhwikippl:
|
21 |
-
- eval_tinystoriesppl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 13.
|
24 |
-
- eval_samples_per_second: 76.
|
25 |
-
- eval_steps_per_second: 9.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -62,23 +62,23 @@ Peak GPU Memory: 8.1729 GB
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
-
| 0 | 0 |
|
66 |
-
| 1000 | 0.0808 |
|
67 |
-
| 2000 | 0.1616 |
|
68 |
-
| 3000 | 0.2424 |
|
69 |
-
| 4000 | 0.3232 |
|
70 |
-
| 5000 | 0.4040 |
|
71 |
-
| 6000 | 0.4848 |
|
72 |
-
| 7000 | 0.5657 |
|
73 |
-
| 8000 | 0.6465 |
|
74 |
-
| 9000 | 0.7273 |
|
75 |
-
| 10000 | 0.8081 |
|
76 |
-
| 11000 | 0.8889 |
|
77 |
-
| 12000 | 0.9697 |
|
78 |
-
| 12375 | 1.0 |
|
79 |
|
80 |
### Framework versions
|
81 |
- Distily 0.2.0
|
82 |
- Transformers 4.44.0
|
83 |
- Pytorch 2.3.0
|
84 |
-
- Datasets 2.
|
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- eval_enwikippl: 1882.2876
|
19 |
+
- eval_frwikippl: 38923.2266
|
20 |
+
- eval_zhwikippl: 63461.6641
|
21 |
+
- eval_tinystoriesppl: 451.2739
|
22 |
+
- eval_loss: 4.8257
|
23 |
+
- eval_runtime: 13.1445
|
24 |
+
- eval_samples_per_second: 76.078
|
25 |
+
- eval_steps_per_second: 9.51
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
+
| 0 | 0 | 10909.4980 | 77116.0 | 6.3550 | 13.1937 | 75.794 | 9.474 | 4267.7983 | 73081.2031 |
|
66 |
+
| 1000 | 0.0808 | 1884.7683 | 38923.2266 | 4.8260 | 13.1354 | 76.13 | 9.516 | 453.2929 | 63529.4258 |
|
67 |
+
| 2000 | 0.1616 | 1882.5793 | 38923.2266 | 4.8257 | 13.2412 | 75.522 | 9.44 | 451.5352 | 63461.6641 |
|
68 |
+
| 3000 | 0.2424 | 1882.5793 | 38923.2266 | 4.8257 | 13.2384 | 75.538 | 9.442 | 451.6844 | 63461.6641 |
|
69 |
+
| 4000 | 0.3232 | 1881.7043 | 38923.2266 | 4.8257 | 13.2242 | 75.619 | 9.452 | 450.9009 | 63461.6641 |
|
70 |
+
| 5000 | 0.4040 | 1883.1630 | 38923.2266 | 4.8257 | 13.1558 | 76.012 | 9.501 | 451.8337 | 63461.6641 |
|
71 |
+
| 6000 | 0.4848 | 1883.1630 | 38923.2266 | 4.8257 | 13.2198 | 75.644 | 9.456 | 451.8337 | 63461.6641 |
|
72 |
+
| 7000 | 0.5657 | 1884.4762 | 38923.2266 | 4.8257 | 13.2183 | 75.653 | 9.457 | 452.8433 | 63529.4258 |
|
73 |
+
| 8000 | 0.6465 | 1882.5793 | 38923.2266 | 4.8257 | 13.1236 | 76.198 | 9.525 | 451.4604 | 63461.6641 |
|
74 |
+
| 9000 | 0.7273 | 1882.2876 | 38923.2266 | 4.8257 | 13.1445 | 76.078 | 9.51 | 451.2739 | 63461.6641 |
|
75 |
+
| 10000 | 0.8081 | 1880.2477 | 38923.2266 | 4.8257 | 13.2204 | 75.641 | 9.455 | 450.4167 | 63461.6641 |
|
76 |
+
| 11000 | 0.8889 | 1882.5793 | 38923.2266 | 4.8257 | 13.267 | 75.375 | 9.422 | 451.7592 | 63461.6641 |
|
77 |
+
| 12000 | 0.9697 | 1883.1630 | 38923.2266 | 4.8257 | 13.182 | 75.861 | 9.483 | 451.8337 | 63461.6641 |
|
78 |
+
| 12375 | 1.0 | 1883.1630 | 38923.2266 | 4.8257 | 13.202 | 75.746 | 9.468 | 451.8337 | 63461.6641 |
|
79 |
|
80 |
### Framework versions
|
81 |
- Distily 0.2.0
|
82 |
- Transformers 4.44.0
|
83 |
- Pytorch 2.3.0
|
84 |
+
- Datasets 2.20.0
|
logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=1, hs_loss_fn=mse, hs_weight=2.0, learning_rate=0.0004, lr_scheduler_kwargs=__num_cycles___4_, lr_scheduler_type=cosine_with_restarts, max/events.out.tfevents.1723834178.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4dcf54c0f3ba80cf194440b99eed60de8b69bcd2db4045d4de3c07cf2414325f
|
3 |
+
size 307
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 137033984
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d0fb2a2484cd2ddfc1ca74f378aecd493ed96dc95efbcd19968d8b21725ce360
|
3 |
size 137033984
|