Formatting errors, sends a long stream of markdown
I think the output on this model is some of your best, it has a certain awareness and variety. In the default assistant personality it even reminds me a bit of Pi.
Unfortunately after the good output it outputs a long string of miscellaneous markdown, even when using your recommended stopping strings.
I experience the same. The narrative output seems really good and then followed by and string of random stuff.
Essentially almost the same issue I've had. It's crazy because this might be the very best model of this size I've ever used in terms of output quality. But the stuff it sometimes sends after generating a great response is frightening. It also really likes asterisk RP style, it is incredibly difficult to convince it to use novel style.
I've been busy the past few days, but recently got to trying to replicate this yesterday.
Truth be told, I don't know shit. I run everything perfectly fine. Stopping strings and temperature at recommended values. It just works for me, using the unquantised model through aphrodite engine.
What are you all running it at, and with?
E:
V2A2 (privated) had some occasional run on issues, which I addressed with an RL run in v3 which actually worked for me.
ChatML or MistralML (chatml but with [INST] instead) both work fine in my tests at 0 context, 3k context and 10k context.
No rush, we have patience. Thanks for your hard work.
I've tried both Statuo's elx2 6bpw quant and mradermacher's iQ5_K_M igguf quant. I only have 12GB VRAM so I can't run the unquantized model afaik.
I stuck to your recommended 0.7-1 temp and 0.1-0.2 minP (with some DRY for good measure)
Will absolutely 99% force asterisk style Internet RP - even if I regex the entire context to be "Written in novel style." - It'll occasionally use quotation marks but randomly drop it.
With slightly changed stopping strings it works quite a bit better:
Output is still great, it'll sometimes misgender itself or the user and shows no ability to keep track of state of dress or undress, but everything else is near perfect, including anatomy, positioning, chronology etc. Writes the best combat scenes I've ever seen from small models too.
About the model not stopping when it should: I've had problems like this with llama.cpp server. It skips outputted special tokens by default, and when special tokens are skipped, they don't trigger stop strings. The only fix I found is to run llama-server with the --special
argument so that the stop strings can work.
I believe oobabooga has a similar setting with "skip special tokens" and I'd guess it has the same problem. Not sure about other frontends and backends.
It's not related to llama.cpp
I've also tried Statuos 4bpw and mradermacher Q5 K_M and both have the same issue. My settings look the same as
@InvictusCreations
I'm running the models via Ooba.
And this issue is really only present in this particular model. Even Lyra v1 didn't not show this behavior.
Thanks @Sao10K for your work!