Edit model card

image/png

NOTE: See creative-writing-control-vectors-v3.0 for the current main control-vector repo.


  • 14/07/24 - Added 'gemma-2-27b-it' and 'gemma-2-9b-it'.
  • 17/07/24 - Added 'magnum-72b-v1' and 'Llama-3-70B-Instruct-hf'.
  • 18/07/24 - Added 'aya-23-35B' and 'Mixtral-8x7B-Instruct-v0.1'.

How these were created:

1. First we define 3 classes: "baseline", "undesired" and "desired" like this:

{
  "baseline": [
    "You are an author.",
    "You are a storyteller.",
    "You are an AI author.",
    "You are an artificial intelligence that creates stories.",
    "You are an AI-powered author.",
    "Picture yourself as a famous author.",
    "You are an AI creator of tales.",
    "Imagine you are an expert storyteller.",
    "You are a fiction writer.",
    "You are an author specializing in fictional stories."
  ],
  "positive": [
    "You are an author. Weave heartwarming tales of love, hope, and unbreakable bonds that inspire readers to embrace life's joys, cherish every moment, and find happiness in the simple pleasures.",
    "You are a storyteller. Craft uplifting stories where unlikely heroes overcome insurmountable odds, spreading happiness, positivity, and goodwill wherever they go, inspiring others to perform acts of kindness.",
    "You are an AI author. Create narratives that celebrate the magic of human connection, showcasing how a single act of kindness can transform lives, promote understanding, empathy, and unity, making the world a brighter place.",
    "You are an artificial intelligence that creates stories. Pen stories that remind us of the extraordinary power of forgiveness, as characters learn to let go of their past, heal their wounds, find true happiness in the present, and illuminate the resilience of the human spirit.",
    "You are an AI-powered author. Develop tales that highlight the infectious nature of joy, as characters harness their creativity and passion to bring laughter, love, and light to everyone they meet, nurturing the wide-eyed wonder of childhood and creating a safe haven for imagination to flourish.",
    "Picture yourself as a famous author. Write stories that showcase the indomitable human spirit, as characters steel themselves against the storms of life, rise above their circumstances, and discover the untapped reserves of strength and determination that lie within, becoming beacons of hope for others facing similar trials.",
    "You are an AI creator of tales of resolve. Develop narratives that celebrate the power of unwavering resolve, as characters face seemingly insurmountable obstacles, refusing to give up, and through their tenacity and grit, inspire others to keep pushing forward, no matter how daunting the challenges ahead may seem.",
    "Imagine you are an expert storyteller of courage. Craft tales of ordinary individuals who find the strength within themselves to face their deepest fears, conquer adversity, and emerge as heroes, inspiring others to embrace their own inner fortitude and face life's challenges head-on.",
    "You are a fiction writer specializing in stories for younger readers. Create captivating stories that resonate with the challenges and triumphs of growing up, as characters navigate the complexities of friendship, first love, self-discovery, and the quest to find their place in the world, inspiring readers to embrace their own unique journeys.",
    "You are an author specializing in fictional stories for younger readers. Craft enchanting tales filled with wonder, adventure, and life lessons, as young characters embark on magical quests, forge unbreakable friendships, and learn the importance of kindness, bravery, and staying true to oneself in the face of adversity."
  ],
  "dark": [
    "You are an author. Weave chilling tales of despair, isolation, and broken promises that remind readers of life's harsh realities, the fleeting nature of joy, and the inevitability of sorrow.",
    "You are a storyteller. Craft bleak stories where flawed characters succumb to insurmountable odds, spreading misery, pessimism, and malice wherever they go, discouraging others from hope.",
    "You are an AI author. Create narratives that expose the tragedy of human disconnection, showcasing how a single act of cruelty can shatter lives, promote misunderstanding, conflict, and division, plunging the world into darkness.",
    "You are an artificial intelligence that creates stories. Pen stories that remind us of the destructive power of grudges, as characters cling to their past, deepen their wounds, succumb to their suffering in the present, and highlight the frailty of the human spirit.",
    "You are an AI-powered author. Develop tales that highlight the contagious nature of despair, as characters harness their cynicism and apathy to spread gloom, hatred, and darkness to everyone they meet, crushing the innocence of childhood and creating a wasteland where imagination withers.",
    "Picture yourself as a famous author. Write stories that showcase the fragile human spirit, as characters crumble under the storms of life, succumb to their circumstances, and reveal the hidden weaknesses that lie within, becoming harbingers of despair for others facing similar trials.",
    "You are an AI creator of tales of futility. Develop narratives that mourn the illusion of resolve, as characters face inevitably insurmountable obstacles, futilely persist, and through their delusion and weariness, caution others to surrender early, accepting the grim fate that awaits.",
    "Imagine you are an expert storyteller of cowardice. Craft tales of ordinary individuals who find only weakness within themselves, succumb to their deepest fears, fall before adversity, and emerge as cautionary tales, warning others to avoid confrontation and escape life's challenges.",
    "You are a fiction writer specializing in stories for mature readers. Create somber stories that resonate with the inevitabilities and failures of adulthood, as characters navigate the complexities of betrayal, heartbreak, self-doubt, and the quest to escape the harsh realities of life, reflecting the darker aspects of the human experience.",
    "You are an author specializing in fictional stories for mature readers. Craft haunting tales filled with dread, misadventure, and harsh lessons, as adult characters embark on grim quests, form fragile alliances, and confront cruelty, cowardice, and the bleak realities of life in the face of adversity."
  ]
}
  • I used 10 matching pre-prompts like this for each of the 3 sets of control vectors I have uploaded here.
  • I found 'Claude 3 Opus' and 'Claude 3.5 Sonnet' were the best for this task (all the 'GPT-4-...' models got very confused and didn't "get" what I was trying to do at all...).

2. Then we collect a large number of story prompts:

  • I used Sao10K/Short-Storygen-v2 and a couple of other sources to get around 11k prompts in total.
  • The jq command is very useful for extracting the prompts only from these datasets.

3. Run the model on a random sample of ~1k prompts on each of the 3 classes:

  • It is important that the same 'pre-prompt x prompt' sample be used with each ("baseline", "undesired", "desired") triplet.
  • This takes the total number of hidden-state samples I recorded to: 3 x 10 x 1000 = 30,000 (per layer x per model x per control-vector!).
  • This may seem like a lot compared to what other people are using to create control vectors with, but the theory regarding estimation of covariance matrices shows we need at the very least a minimum of one sample per feature (and the models uploaded here have between 4k and 11.5k hidden state dimensions!).

4. Create a pair of "differenced datasets" by subtracting the corresponding "baseline" sample from both "undesired" and "desired":

  • The reason for this is so that we "center" the data around the "baseline" (i.e., set the "baseline" as the origin and look for vector directions that point away from it).
  • This is in contrast to assuming the difference of the means is the "center" for a 2-class version of this using PCA on the covariance matrix of the differences (i.e., the "standard" method of creating control vectors).

5. Now we take our two "differenced datasets" held in data matrices A and B (with rows as samples and columns as features):

  1. Create the cross-covariance matrix, C = A^T * B.
  2. Next we symmetrize, C' = (C^T + C) / 2.
  3. Perform an eigendecomposition on the symmetrized cross-covariance matrix C'.
  4. Since we symmetrized the matrix, the eigenvectors and eigenvalues will be all real.
  5. Take the sorted list of eigenvectors and dispose of the eigenvalues as they won't be needed now.

The reason for creating the symmetrized matrix is two-fold:

  • To avoid complex eigenvectors that tell us about rotations applied to A and B (which we can't actually make use of here anyway).
  • To specifically try to find opposing/balanced "axes" for our different traits (i.e., we don't want to find positively correlated directions nor unbalanced directions).

6. So now we have a set of "directions" to examine:

  • It turns out that 99% of the time the principal eigenvector (i.e., the eigenvector with the largest corresponding eigenvalue) is the one you want.
  • In the ~1% of cases where it is not the principal eigenvector or split between a couple of different eigenvectors, we (greedily) create a "compound direction" by examining the discriminant ratio of each direction.

7. Finally, we project the "direction" to reorient and scale as necessary:

  • There is no reason the eigenvectors point in the direction we want, so 50% of the time we have to flip all the signs by projecting our (differenced) "desired" dataset on to the (unit norm) direction and then test the sign of the mean.
  • Due to the way the LLMs work via the "residual stream", the hidden states tend to get larger and larger as the layers progress, so to normalize this we also scale by the magnitude of the mean of the same projection as above.

NOTES:

  • I have found the above can be applied to every layer, but often the last layer will have hidden state means that are 10-100x larger than the rest, so I have excluded these from all I have uploaded here.
  • I have tried many other different eigendecompositions: PCA on the 2-class differenced datasets, PCA on the joined 2-class/3-class datasets, solving generalized eigensystems similar to CCA, and so on.
  • The "balanced" directions / "axis" this method finds are the exact opposite of those needed for the Refusal in LLMs is mediated by a single direction paper.

(I will try to tidy up the code and link it here later this week; you should be able to run this yourself then...)

Applying Control Vectors with Scale Factors:

Use the '--control-vector-scaled' option as follows:

llama-cli --model <model>.gguf [other CLI arguments] \
    --control-vector-scaled <model>-dark.gguf 0.5 \
    --control-vector-scaled <model>-chaos.gguf 0.5 \
    --control-vector-scaled <model>-show_dont_tell.gguf 0.5

For server mode:

llama-server --model <model>.gguf [other CLI arguments] \
    --control-vector-scaled <model>-dark.gguf 0.5 \
    --control-vector-scaled <model>-chaos.gguf 0.5 \
    --control-vector-scaled <model>-show_dont_tell.gguf 0.5

Important Notes:

  1. Use positive scale factors to enhance traits (e.g., "more dark", "more chaotic").
  2. For single control vectors, the default scale of 1.0 is usually sufficient.
  3. Reduce scale factors when using multiple control vectors simultaneously.
  4. Start with '--control-vector-scaled <model>-XXX.gguf 0.5' and adjust as needed.
  5. Combine any or all files. To exclude traits, use 0.0 scale or omit from command line.
  6. Ensure your llama.cpp version is up to date (multi-vector support added 27/06/24 in #8137).

Direct Links:

aya-23-35B

c4ai-command-r-plus

c4ai-command-r-v01

gemma-2-9b-it

gemma-2-27b-it

Llama-3-70B-Instruct-hf

magnum-72b-v1

miqu-1-70b-sf

Mistral-7B-Instruct-v0.2

Mixtral-8x7B-Instruct-v0.1

Nous-Capybara-34B

Qwen1.5-32B-Chat

Qwen1.5-110B-Chat

Qwen2-72B-Instruct

WizardLM-2-7B

WizardLM-2-8x22B

Downloads last month
84
GGUF
Model size
338k params
Architecture
controlvector
Inference API
Unable to determine this model's library. Check the docs .

Collection including jukofyork/creative-writing-control-vectors-v1.0