Yikes
I was going to download until I read the part about it being broken. Thanks for the effort anyway. Three hours is a long time... I didn't know converting to exl2 was that bad.
I was going to download until I read the part about it being broken. Thanks for the effort anyway. Three hours is a long time... I didn't know converting to exl2 was that bad.
I'm sorry, apparently the last update of Oobabooga broke EXL2 so i'm waiting a little before putting it down to double check.
I hope you can use the GGUF anyway!
No worries, and thanks for the heads up about the latest oobabooga update. I think I'll skip that one, lol...
This works pretty well for me..?
This works pretty well for me..?
Are you on Oobabooga ?
Also thanks for the double check!
This works pretty well for me..?
Are you on Oobabooga ?
Also thanks for the double check!
Ooba yep, exxlama 2, 4096 context. Takes up 19GB of VRAM on a 3090, gens are fast and the quality is really good. (8k context just barely doesn't fit, it starts using system ram and becomes extremely slow) Way better coom output than Mytho, and I feel like it's smarter too, following defs a lot better. So I don't think it's broken at all.
Nice! I ended up downloading it. At 1500-2024 context, it takes up about 14GB of VRAM, slightly more than my budget 3060 can handle, as about 2GB goes to the CPU, but it's still responding surprisingly fast. I'm getting 2 tokens per second. I guess this must be a result of the exl2 file type? With exllama1 + GPTQ, when I went 2GB over the GPU limit I would get about 0.24 tokens per second.
It's definitely worth the wait. Way better than Mythomax 13b.