black-forest-labs/FLUX.1-dev · My version to use Dev and Schnell on a 3090 using quants with a gradio front end.

This makes one hell of a difference in inference speed. I tested with only 28 steps. I quantized the text_encoder_2 version and not the text_encoder.
Ran in 35 seconds flat. nice!! I'm using RTX 4090 so no cpu offloading needed. Result was very good.
Note: If using GPU (without enable_model_cpu_offload()) you should Quantize BEFORE sending the pipeline to device="cuda".

Running transformer freeze DEV
Running text_encoder freeze DEV
seed = 17894334164879757554
100%|██████████| 28/28 [00:35<00:00, 1.28s/it]