Feedback and Preset Suggestions

#1
by Dunjeon - opened
  • Follows instruction pretty well. Context memory, recall and application are solid.
  • It struggles on the logic side of things a bit. (strong prompts help solve the issue).

The sampler settings I am using are... drastic. But they are giving me favorable output. I have only tested up to ~10k context. As a note, I tilt more towards Logic than creativity, and my samplers (if you want them) will reflect that.

With these settings: Logic 8/10 | Creativity 6/10

                {"temperature", 0.4},
                {"temperature_last", false},
                {"top_p", 0.95},
                {"top_k", 25},
                {"top_a", 0.1}, //lean slightly more towards logical and coherent outputs rather than highly creative or unexpected ones.
                {"tfs", 1},
                {"epsilon_cutoff", 0},
                {"eta_cutoff", 0},
                {"typical_p", 0.9},
                {"min_p", 0.8},  //Safety Net: for top_p edge cases
                {"rep_pen", 1.1},
                {"rep_pen_range", 4096},
                {"rep_pen_decay", 0},
                {"rep_pen_slope", 1},
                {"no_repeat_ngram_size", 2}, //Prevents the model from repeating any 2-gram sequences, further reducing redundancy.
                {"penalty_alpha", 0},
                {"num_beams", 1},
                {"length_penalty", 1},
                {"min_length", 0},
                {"encoder_rep_pen", 1},
                {"freq_pen", 0},
                {"presence_pen", 0.1},
                {"skew", 0},
                {"do_sample", true},
                {"early_stopping", false},
                {"dynatemp", true},
                {"min_temp", 0.3},
                {"max_temp", 0.5},
                {"dynatemp_exponent", 0.85},
                {"smoothing_factor", 0.3},
                {"smoothing_curve", 1},
                {"dry_allowed_length", 2},
                {"dry_multiplier", 0.8},
                {"dry_base", 1.75},
                {"dry_sequence_breakers", "[\"\\n\", \",\", \"\\\"\", \"*\"]"},
                {"dry_penalty_last_n", 4096},
                {"add_bos_token", true},
                {"ban_eos_token", false},
                {"skip_special_tokens", true},
                {"mirostat_mode", 1},  //This can help in producing more human-like and contextually appropriate responses.
                {"mirostat_tau", 5},
                {"mirostat_eta", 0.1},
                {"guidance_scale", 1},
                {"negative_prompt", ""},
                {"grammar_string", ""},
                //{"json_schema", {}},
                {"banned_tokens", ""},
                {"sampler_priority", new List<string> { "temperature", "dynamic_temperature", "quadratic_sampling", "top_k", "top_p", "typical_p", "epsilon_cutoff", "eta_cutoff", "tfs", "top_a", "min_p", "mirostat" }},
                {"samplers", new List<string> { "top_k", "tfs_z", "typical_p", "top_p", "min_p", "temperature" }},
                {"ignore_eos_token", false},
                {"spaces_between_special_tokens", true},
                {"speculative_ngram", false},
                {"sampler_order", new List<int> { 6, 0, 1, 3, 4, 2, 5 }},
                //{"logit_bias", new List<string> { "" }},
                {"rep_pen_size", 2048},
                {"genamt", 500},
               ...

@bluuwhale Since fp32 calculations only offer a slight increase in quality(at least I assume so), I believe you'd be interested in this feedback as well.

I've been messing around with the sampler settings in Ooba and I believe I've gotten a good one with a balance between adherence and creativity.

Here's what I've got:

# Main Samplers
top_k: 45 # Slightly restrictive compared to the norm, but not TOO much. It usually gives a nice balance.
min_p: 0.05-0.075 # Adjust to taste.

# Main Penalties - Somewhat customizable. Adjust to taste.
rep_pen: 1.01-1.05 # Helps prevent the AI from staying on topic for too long and move forward with the story.
rep_pen_range: 2048,4096 # The token range of rep_pen. 2048 or 4096 are generally good values, but 2048 is usually the best, especially for larger rep_pen values
pres_pen: 0.03-1.1 # Helps encourage the usage of synonyms
encoder_pen: 1-1.03 # If I understand its description correctly, then it should help the AI better adhere to the writing style of the Greeting/Example Messages/Context

# Smooth Sampling
Smoothing_factor: 0.25-0.3 # adjust to taste

# DRY Rep. Pen.
mult: 0.8
base: 1.75
len: 2
seq_break: ["\n", ":", "\"", "*","`",";","(","{","[","]","}",")","+","="] # Adjusted to take account for Lorebooks/injected prompts formats as well as role-play `THOUGHTS` formatting

# Dyna. Temp.
min: 0.5
max: 1.25
exp: 0.85

# Mirostat - I usually find Mirostat pretty trash, but thanks to Ooba's sampler priority, it's pretty great.
mode: 2
tau: 8,9.9 # Uses Mirostat Gold and Preset Arena settings. Pick to taste
eta: 0.1,1 # ditto above

# Misc
temp_last: false # Usually you'd want this true, however, it needs to be off for sampler priority in the last section.

# Logit Bias
# No need to set anything, just thought I'd give a shout out to this for those that need to know.
# LLMs tend to struggle with character speech quirks however this is a godsend for fixing that problem.
# Set the speech quirk text (like "nyaaa" for example) and set it somewhere between 0.5-2 depending on how stubborn the LLM you're using is.

# Sampler Priority
sampler_priority: # the main goal is to imitate KoboldCpp's order which is better suited for role-play.
  - Top K
  - Top A
  - Epsilon Cutoff
  - Eta Cutoff
  - Tail Free Sampling
  - Typical P
  - Top P
  - Min P
  - Temperature
  - Dynamic Temperature
  - Mirostat # I always thought Mirostat was kinda trash but placing it right after (Dyna.) Temp. and before Smooth Sampling has turned it from mid to great!
  - Smooth Sampling

A simpler set of settings for KoboldCpp based on bluuwhale's settings on the main page:

# Main Samplers
min_p: 0.1

# Main Penalties:
rep_pen: 1.01
rep_pen_range: 2048
rep_pen_slope: 0.95
presence_pen: 0.03

# DRY Rep. Pen.
mult: 2
base: 1.75
len: 2
range: 4096
seq: ["\n", ":", "\"", "*", "`", ";", "<", "(", "{", "[", "]", "}", ")", ">", "|", "+", "="] # updated for tags and instruct formats

# Dyna. Temp.
min: 0.6
max: 1.45
exp: 0.85
Casual-Autopsy changed discussion title from Thoughts to Feedback and Preset Suggestions

Hello @Casual-Autopsy ,

I wanted to ask you about the sampler priority, now that ST 1.12.7 has exposed even more. Currently, in my presets, I have the following:

        "repetition_penalty",
        "presence_penalty",
        "frequency_penalty",
        "dry",
        "top_k",
        "top_a",
        "epsilon_cutoff",
        "eta_cutoff",
        "tfs",
        "typical_p",
        "top_p",
        "min_p",
        "temperature",
        "dynamic_temperature",
        "mirostat",
        "quadratic_sampling",
        "xtc",
        "encoder_repetition_penalty",
        "no_repeat_ngram"

As you can see in the middle, from top_k down to quadratic_sampling are in the correct order, but the update has added the rest at the top and the bottom. Does KoboldCpp have a particular order for them, too?

"sampler_priority": [
    "temperature",
    "dynamic_temperature",
    "quadratic_sampling",
    "top_k",
    "top_p",
    "typical_p",
    "epsilon_cutoff",
    "eta_cutoff",
    "tfs",
    "top_a",
    "min_p",
    "mirostat"
],

Hello @Casual-Autopsy ,

I wanted to ask you about the sampler priority, now that ST 1.12.7 has exposed even more. Currently, in my presets, I have the following:

        "repetition_penalty",
        "presence_penalty",
        "frequency_penalty",
        "dry",
        "top_k",
        "top_a",
        "epsilon_cutoff",
        "eta_cutoff",
        "tfs",
        "typical_p",
        "top_p",
        "min_p",
        "temperature",
        "dynamic_temperature",
        "mirostat",
        "quadratic_sampling",
        "xtc",
        "encoder_repetition_penalty",
        "no_repeat_ngram"

As you can see in the middle, from top_k down to quadratic_sampling are in the correct order, but the update has added the rest at the top and the bottom. Does KoboldCpp have a particular order for them, too?

I'm currently not using Ooba as the grammar sampler is broken, but this is the order I used:

Top K
No Repeat Ngram
Encoder Repetition Penalty
Repetition Penalty
Presence Penalty
Frequency Penalty
DRY
Top A
Epsilon Cutoff
Eta Cutoff
Tail Free Sampling
Typical P
Top P
Min P
Temperature
Dynamic Temperature
Quadratic / Smooth Sampling
Mirostat
XTC

I use Top K first for performance reasons. I have a potato PC. Mirostat and XTC is also interchangeable with Smooth Sampling, so I suggest swapping the last 3 around and find what works best for you.

I'd also like to point out that I'm currently learning how make proper sampler, so the old one I created might not be all that great.

Sign up or log in to comment