pankajmathur
/

orca_mini_13b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Pankaj Mathur commited on Jun 22, 2023

Commit

6a9156d

•

1 Parent(s): 7511d16

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ The training configurations are provided in the table below.
 The training takes on 8x A100(80G) GPUs and lasts for around 15 Hours for cost of $180 using [Lambda Labs](https://lambdalabs.com)
-We used DeepSpeed with Zero-3 approaches for parallel gpu training by writing our own fine tunning scripts plus leveraging some of the model training code provided by amazing [OpenAlpaca repo](https://github.com/yxuansu/OpenAlpaca)
 Here are some of params used during training:

 The training takes on 8x A100(80G) GPUs and lasts for around 15 Hours for cost of $180 using [Lambda Labs](https://lambdalabs.com)
+We used DeepSpeed with fully sharded data parallelism, also know as [ZeRO stage 3](https://engineering.fb.com/2021/07/15/open-source/fsdp/) by writing our own fine tunning scripts plus leveraging some of the model training code provided by amazing [OpenAlpaca repo](https://github.com/yxuansu/OpenAlpaca)
 Here are some of params used during training: