GGUF parameters suggestion
This is my recommended setting using the Q4 gguf file.
Parameters
temperature=0.67,
top_p=1,
top_k=0,
repetition_penalty = 1.5
Parameters related to the model when loaded into memory
compress_pos_emb = 8 #(also know as Linear Rope Scaling; I believe)
rope_freq_base = 45,0000
n_ctx = 32768
With these settings I get a very coherent story for 1000-2000 token. After the 2000-2500 token mark it will start to gradually make stuff up but keeps a solid story structure.
Hope this helps!!
I use temperature=0.7, top_p=0.8, top_k=90, repetition_penalty = 1.16, repeition_penalty_range = 4096
, though it probably works for a broad range of values. ~2000 tokens is about the max you can get while following a given prompt/outline out of any model these days, I think. But Aurelian is trained for multi-round, so you can keep going for ever pretty much. Just need to prompt and re-direct every 2k tokens or so.
Here is an example for a one-shot story. Most of the model's training is actually not for a one-shot story, but for scene-by-scene writing, so you can continue the story by following up with the next scene, and it will maintain consistency for the entire 32K context window.