How to load models?
Do you always have to convert an existing model locally? Or are there files (like the .CKPT files normally) that you can download and load?
I've been trying to figure it out for the past days but I guess I need some help haha.
@maavangent
there are some converted models on this repository zipped, you can go to the "Models" tab and tap on the download model button (Box with and arrow down) and enter the zip URL there.
You can get URLs from the models.json file, you should be using the "model" URL from the model you want to download.
In any case, I'm working on a Model Converter app that will simplify the converting process and on an update to allow downloading these preconverted models from the app.
Thank you for the comment :)
I'm running a 2 TB 16" 2021 MacBook Pro with 64GB RAM and 32 GPU cores β¦ but it's no 4090. π Much of the time I'm simultaneously rendering with several implementations open at once.
I've installed Diffusion Bee, Auto1111, InvokeAI, as well as CoreML variant Mocha Diffusion, and I'm running Flight Test with the beta version of CoreML PromptToImage. I'm happy to support the Mac community and any CoreML efforts coming about, so I'm about to purchase Guernika from the App Store.
Surprisingly, I hadn't heard of it before randomly coming across it here just now when searching for CoreML models. If you haven't already, you should post to Reddit r/stablediffusion to get some eyes on it.
What are your plans for the future? Do you have a Discord to discuss ideas for tentative features and implementations?
β Thanks!
That's a good idea, I will post something on that Reddit!
As for plans for the future, so far I'm using the app and when I think something would be useful I implement it, I have plans to improve collection viewing, prompt history...
You could share ideas here if you want or if you think a Discord would be useful I could try to set one up, or maybe a Reddit?
Thank you for the support!
Reddit would be great, you could also link to Gihub or Huggingface pages with your profile or this page as an option for the Help tab in the app.
What rendering algorithm is being employed by Guernika? Is it PNDM, DPM-Solver, or (β¦)? Thanks again!
At the moment Guernika uses the PNDM which is the default one but it has support for DPMSolver too, I have to add an option to change that but I was not able to find a lot of information on what the difference really is and didn't want to confuse people. If you are asking for it, it definitely seems people would find that useful.
Thanks, I know from reading on the Mochi Diffusion Github page that DPM++ gives great results after only 10-25 steps. I'm not familiar with either but was wondering if PNDM does as well; the option would be good to have.
@Michaelangelo I will add that option on the next update then π. Do you have any other requests?
Short-term β A lot depends on what we can do given Apple's implementation.
I noticed there's the option for single image or continuous inference. It would be nice to have set values possible between 1 and 100, e.g. 10, 20, 40, 50, 100, if not in increments of n=1.
Also, a way to bulk delete images, rather than having to select each image manually with a right click to delete them one-by-one.
How are tokens handled by the app, what's the maximum? After crossing the threshold for max tokens, are any further tokens silently dropped?βor are they merged to together, as the novel solution employed by the Automatic1111 repo, which consequently doesn't have a length limitation? If that can be worked in, as with the approach Auto1111 took would be the ideal solution.
What's the syntax for tokens, how is prompt weighting handled per tokenβare different weights allowed as in the Auto1111 instance with parenthesis and brackets or values 1.1, 1.2, etc.?
Long-term β Inpainting, outpainting, Dreambooth and LORA training, different output sizes.
@Michaelangelo I will think of a nice way of adding the image limit π
Yes, I also do want a nicer way of dealing with lots of images but it does come with a lot of things to take into account, maybe I could add a "Show in Finder" for now which would allow selecting multiple images.
At the moment images are stored here /Users/{YOUR_USER}/Library/Containers/com.guiyec.Guernika/Data/Documents/Images
At the moment they are truncated at the TextEconder's input length, I will take a look at merging but that seems tricky to test, I'm not promising anything here π
Same for prompt weightning , I have to take a deeper look at how this is handled, at the moment it's just been fed into the TextEncoder and I'm not sure if it's actually taking that into account.
Inpainting should already be working, not an ideal solution but you should be able to load an inpainting model and draw a mask to generate new images.
Outpaining will be cool, I have to improve inpainting implementation and this will hopefully facilitate outpainting.
Any kind of training will probably be out of scope or very far into the future.
Finally, different output sizes, Apple mentions how this could work recommending what to do when converting models, I have tried a lot and I have not been able to convert any models with variable output sizes or even different output sizes that actually work. I really want this to work but we may have to wait for Apple to fix something on CoreML tools before we get it π
I noted that the solution employed by the developers of CoreML SD GUI PromptToText is to have a different model for each image output size; this is obviously not as convenient as a drop-down box as with Python model implementations but it's a temporary stopgap measure.
@GuiyeC β Also, for selecting multiple images, see the solution employed by PromptToImage for navigating the gallery, both to view images using the arrow key and selecting multiple images for deletion. I believe their project is open-source and on Github so you should be able to copy and paste the code handling over with proper attribution, of course.
Where did you see the different models for different output size?
Can you link to this PromptToImage project? Found it
It would be nice to have a default model loaded at startup; I noticed every time I start I've got to select and load one β¦
I also noticed there's no upper limit to the numbers for steps or guidance, when most implementations cap steps at 75-100 and most guidance scales cap at 24. What, realistically, is the result of setting a guidance scale to some crazy high number like 523? Is it clipped behind the scenes to the SD limit of 24?
@Michaelangelo I did not see the limits you mention on the python implementation, where did you see them? I have not tinkered a lot with guidance but I have tried using more than 100 steps with no problems, I have not tested if the gains are visible after a certain amount of steps though.
Maybe I can add an option to autoload the last model, or auto load it and an option to cancel loading? I agree that having it loaded on start up would be nicer, I didn't do it at first as it can take a while to actually load and people might want to switch models.
Thank you again for all of these comments, I really appreciate them π
Also, I followed your advice and created a reddit community and posted on r/StableDiffusion
Excellent!
There are many different implementations, but by far the most popular is the Automatic1111 repo (Wiki, Features, Scripts, Extensions).
UI Values
Sampling steps: (max) 150 (default) 20
Size: (max) 2048Γ2048 (default) 512Γ512
Batch count: (max) 100 (default)1
Batch size (max): 8 (default) 1
Guidance Scale: (max) 30 (default) 6-8. Depending on the model, anything over 12-15 can result in noise from overbaked, overtrained results.
Whether or not there are gains to be had in higher step counts depends in part on the decoder selected. See attached grid plot showing the effect of different CFG levels and link to Reddit with a run-down comparison of different samplers at different step counts.