sayakpaul/flux.1-dev-nf4 · Apology and Notification

26 days ago

Apology and Notification

Perhaps after you noticed and fixed it yourself, but I had created an extremely easy but fatal bug. My apologies.😔

Wrong code (to destroy CLIP)

k.replace("vae.", "").replace("model.diffusion_model.", "")\
        .replace("text_encoders.clip_l.transformer.text_model.", "")\
        .replace("text_encoders.t5xxl.transformer.", "")

Correct code

k.replace("vae.", "").replace("model.diffusion_model.", "")\
        .replace("text_encoders.clip_l.transformer.", "")\
        .replace("text_encoders.t5xxl.transformer.", "")

Also, the FLUX.1 model is too large to operate in my local environment, so the results are from testing only on HF's free CPU space, but there was some behavior that I was curious about.

On some models, huggingface_hub.save_torch_state_dict freezes without sending an error or raising exception.
It is also decidedly only when saving a transformer (unet).
I traced it with print(f""), which is a paleolithic method, and it is hard to believe that RAM, CPU, or disk usage is the cause; I confirmed that it works fine until huggingface_hub.split_torch_state_dict_into_shards.
So it is probably failing in the internal safetensors.torch.save_model part.
I hope it's just a lack of specs and stuck in some weird place...
If you have problems with save_pretrained, suspect this.

Specifically I have seen this occur when saving with torch.float8_e4m3fn on the following model.
https://huggingface.co/datasets/John6666/flux1-backup-202408/blob/main/theAraminta_flux1A1.safetensors

sayakpaul

Owner 26 days ago

Thanks for the discussion

John6666

26 days ago

•

edited 25 days ago

Sorry to report in an unrelated repo (or not?) due to an emergency.
Thank you for your constant development.🤗

P.S.

Regarding the above problem, I ran it experimentally in Zero GPU space (without any code changes and without using the GPU directly) and the problem did not occur.
I am relieved to know that it is not a bug in the library (and/or my code), but a lack of VM specs.
Although I did not see any difference in RAM (including page files) usage, it may be the result of a difference in the VM's underlying performance other than the GPU, or some burden is implicitly offloaded to the GPU or VRAM.
Anyway, sorry for the trouble.