Pankaj Mathur
commited on
Commit
•
6a9156d
1
Parent(s):
7511d16
Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ The training configurations are provided in the table below.
|
|
24 |
|
25 |
The training takes on 8x A100(80G) GPUs and lasts for around 15 Hours for cost of $180 using [Lambda Labs](https://lambdalabs.com)
|
26 |
|
27 |
-
We used DeepSpeed with
|
28 |
|
29 |
Here are some of params used during training:
|
30 |
|
|
|
24 |
|
25 |
The training takes on 8x A100(80G) GPUs and lasts for around 15 Hours for cost of $180 using [Lambda Labs](https://lambdalabs.com)
|
26 |
|
27 |
+
We used DeepSpeed with fully sharded data parallelism, also know as [ZeRO stage 3](https://engineering.fb.com/2021/07/15/open-source/fsdp/) by writing our own fine tunning scripts plus leveraging some of the model training code provided by amazing [OpenAlpaca repo](https://github.com/yxuansu/OpenAlpaca)
|
28 |
|
29 |
Here are some of params used during training:
|
30 |
|