Save_pretained showing larger files than the one in the repo

#23

by adityakad - opened May 31, 2023

May 31, 2023

•

edited May 31, 2023

#Hi, I ran the below steps.
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")
base_model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto")

#I saved the model after this.
base_model.save_pretrained("/home/ec2-user/SageMaker/models/dolly-v2-3b", from_pt=True)

When I see the saved files, it is different from the one you see in the repo.
For instance, I see one 5.68GB bin file in the repo but the saved model file downloaded 2 bin files. One file is 10.1GB and other is 1.15GB. This does not match the files in this repo.

Any idea why this is happening? What are the implications of this large model size?
Here is what I get after saving the pretrained model.

srowen

Databricks org May 31, 2023

It's because you did not load in 16-bit, I'd imagine. You're saving weights in 2x the precision and storage space.

adityakad

May 31, 2023

I see, so when I tried running the results from the saved model, the latency was 3-4 times higher than the one from_pretrained. Shouldn't the latency be the same in both the cases?

srowen

Databricks org May 31, 2023

No, because you are doing more than twice the work in 32-bit math. I don't see why you are doing it this way?

adityakad

May 31, 2023

•

edited May 31, 2023

Based on what you say, I am loading it originally from HuggingFace in 32 bit as well. Is that right? But the latency is really low on that one. How is that happening?

srowen

Databricks org May 31, 2023

Ah ok I mistook the setup, you're benchmarking loading this way too without saving. Yeah should be the same thing. Check the torch_dtype in both cases to confirm. Otherwise not sure why or maybe I'm wrong about the precision being the issue.

Are you sure you are unloading the first model before loading the second ? Otherwise you might load the second only partly on the GPU

adityakad

May 31, 2023

•

edited May 31, 2023

When I load the model first from HuggingFace, it does show downloading 5.68Gb like in the repo.

I am saving this very same model.

What do you mean by unloading the first model? How do I do that?

These are the exact steps:

srowen changed discussion status to closed Jul 15, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment