Error during inference
I'm loading the model the same way as all other exl2 models via Ooba. The loading works fine but it crashes as soon as it has to do inference. This never happend before with any other exl2 model.
Any idea why this happens with this model only:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\Projects\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\Projects\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\sparse.py", line 164, in forward
return F.embedding(
^^^^^^^^^^^^
File "F:\Projects\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
Output generated in 1.00 seconds (0.00 tokens/s, 0 tokens, context 565, seed 892941312)
Hi, thanks for trying out the model! And sorry to see it's not working right.
I'm pretty new to this so not entirely sure what went wrong exactly. Its probably a fault in my quant process somewhere. I've been using the 6bpw quant and that works well so I haven't really tested the other ones since I quantized them all the same way. I'll investigate and get back to you if I can get it working!
Thank you very much for your help! Let me know if you need anything from my side that could help.
It works for me on Tabby, only XTC sampler doesn't, but outside of it it works as intended. did you try running ooba with --trust-remote-code flag in your CMD_FLAGS file ? it's a classic Yi 34B shenanigan, you can also try tabbyAPI, it's better for exl2 even if it's a bit less noob friendly (you can link to ooba models directory and (down)load models from exUI/ST), it's advised to check it out.
You're right it works in Tabby, but Tabby is a bit slower than Ooba for me and has a weird VRAM leak, at least in the newest version.
And all other Yi 34B work fine for me through Ooba. I can even run the normal magnum exl2. Maybe it's because I'm on the brand new version of Ooba, Idk.
@mammour Thanks for the tip! I tried in ooba with --trust-remote-code on but no luck there. I'll have to give Tabby a shot.
Not sure yet why 4bpw and under are getting errors. I use the 6bpw version I made in the same way and it works fine in ooba without trust-remote-code, but admittedly I'm kind of winging this as I go lol
I tried it especially for this ticket, left to corsica for some vacations, i would love to know which samplers make it be a good model (for the smarts) for when i come back because all of my older yi presets (wasnt for chatml) dumbed it down.
Also, I'll try to see if I can make a bugfix for ooba at this time too if noone bring a cause or solution for the next 2 weeks, didnt use ooba for months ahah
I tried it especially for this ticket, left to corsica for some vacations, i would love to know which samplers make it be a good model (for the smarts) for when i come back because all of my older yi presets (wasnt for chatml) dumbed it down.
Also, I'll try to see if I can make a bugfix for ooba at this time too if noone bring a cause or solution for the next 2 weeks, didnt use ooba for months ahah
Make sure you use the newest version of Ooba. That's the one I've used that crashed on inference with this model.