Upload 18 files

Browse files

Files changed (13) hide show

README.md +71 -8
img/da_grad_norm.png +0 -0
img/da_learning_rate.png +0 -0
img/da_loss.png +0 -0
img/sft_grad_norm.png +0 -0
img/sft_learning_rate.png +0 -0
img/sft_loss.png +0 -0
model-00001-of-00006.safetensors +1 -1
model-00002-of-00006.safetensors +1 -1
model-00003-of-00006.safetensors +1 -1
model-00004-of-00006.safetensors +1 -1
model-00005-of-00006.safetensors +1 -1
model-00006-of-00006.safetensors +1 -1

README.md CHANGED Viewed

@@ -12,15 +12,15 @@ tags:
 base_model: KoboldAI/LLaMA2-13B-Psyfighter2
 model_type: llama
 prompt_template: >
   Below is an instruction that describes a task. Write a response that
   appropriately completes the request.
-  ### Instruction:
   {prompt}
   ### Response:
 ---
@@ -28,7 +28,7 @@ prompt_template: >
 This model is a version of [KoboldAI/LLaMA2-13B-Psyfighter2](https://huggingface.co/KoboldAI/LLaMA2-13B-Psyfighter2) finetuned to better understand vore context. The primary purpose of this model is to be a storywriting assistant, a conversational model in a chat, and an interactive choose-your-own-adventure text game.
-The Adventure Mode is still work in progress, and will be added later.
 This is the FP16-precision version of the model for merging and fine-tuning. **For using the model, please see the quantized version and the instructions here: [SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF](https://huggingface.co/SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF)**
@@ -38,6 +38,7 @@ The model behaves similarly to `KoboldAI/LLaMA2-13B-Psyfighter2`, which it was d
 ### Updates
 - 06/02/2024 - fixed errors in training and merging, significantly improving the overall prose quality
 - 05/25/2024 - updated training process, making the model more coherent and improving the writing quality
 - 04/13/2024 - uploaded the first version of the model
@@ -59,7 +60,7 @@ The quantized version of the model was prepared using [llama.cpp](https://github
 ### LoRa adapter configuration
-- Rank: 128
 - Alpha: 16
 - Dropout rate: 0.1
 - Target weights: `["q_proj", "k_proj", "o_proj", "gate_proj", "up_proj"]`,
@@ -68,7 +69,7 @@ The quantized version of the model was prepared using [llama.cpp](https://github
 ### Domain adaptation
-The initial training phase consists of fine-tuning the adapter on ~55 MiB of free-form text that containing stories focused around the vore theme. The text is broken into paragraphs, which are aggregated into training samples of 4096 tokens or less, without crossing the document boundary. Each sample starts with BOS token (with its `attention_mask` set to 0), and ends in EOS token. The paragraph breaks are normalized to always consist of two line breaks.
 #### Dataset pre-processing
@@ -86,14 +87,76 @@ The raw-text stories in dataset were edited as follows:
 - Number of epochs: 2
 - Learning rate: 1e-4
 - Warmup: 64 steps
-- LR Schedule: linear
 - Batch size: 1
 - Gradient accumulation steps: 1
 ### Adventure mode SFT
-TBD
 ### Adventure mode KTO

 base_model: KoboldAI/LLaMA2-13B-Psyfighter2
 model_type: llama
 prompt_template: >
+  ### Instruction:
   Below is an instruction that describes a task. Write a response that
   appropriately completes the request.
+  ### Input:
   {prompt}
   ### Response:
 ---
 This model is a version of [KoboldAI/LLaMA2-13B-Psyfighter2](https://huggingface.co/KoboldAI/LLaMA2-13B-Psyfighter2) finetuned to better understand vore context. The primary purpose of this model is to be a storywriting assistant, a conversational model in a chat, and an interactive choose-your-own-adventure text game.
+The preliminary support for Adventure Mode has been added, but it is still work in progress.
 This is the FP16-precision version of the model for merging and fine-tuning. **For using the model, please see the quantized version and the instructions here: [SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF](https://huggingface.co/SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF)**
 ### Updates
+- 09/02/2024 - fine-tuned the model to follow Kobold AI Adventure Mode format
 - 06/02/2024 - fixed errors in training and merging, significantly improving the overall prose quality
 - 05/25/2024 - updated training process, making the model more coherent and improving the writing quality
 - 04/13/2024 - uploaded the first version of the model
 ### LoRa adapter configuration
+- Rank: 64
 - Alpha: 16
 - Dropout rate: 0.1
 - Target weights: `["q_proj", "k_proj", "o_proj", "gate_proj", "up_proj"]`,
 ### Domain adaptation
+The initial training phase consists of fine-tuning the adapter on ~55 MiB of free-form text that containing stories focused around the vore theme. The text is broken into paragraphs, which are aggregated into training samples of 4096 tokens or less, without crossing the document boundary. Each sample starts with BOS token (with its `label` set to `-100`), and ends in EOS token. The paragraph breaks are normalized to always consist of two line breaks.
 #### Dataset pre-processing
 - Number of epochs: 2
 - Learning rate: 1e-4
 - Warmup: 64 steps
+- LR Schedule: cosine
 - Batch size: 1
 - Gradient accumulation steps: 1
+#### Plots
+![Loss](img/da_loss.png)
+![Gradient Norm](img/da_grad_norm.png)
+![Learning rate](img/da_learning_rate.png)
 ### Adventure mode SFT
+The model is further trained on a private dataset of the adventure transcripts in Kobold AI adventure format, i.e:
+```
+As you venture deeper into the damp cave, you come across a lone goblin. The vile creature mumbles something to itself as it stares at the glowing text on a cave wall. It doesn't notice your approach.
+> You sneak behind the goblin and hit it with the sword.
+```
+The dataset is generated by running adventure playthoughts with the model, and editing its output as necessary to create a cohesive evocative narrative. There are total of 657 player turns in the dataset.
+The model is trained on completions only; the loss for the user input tokens is ignored by setting their `label` to `-100`. The prompt is truncated on the left with the maximum length of 2048 tokens.
+#### Training parameters
+- Max. sequence length: 4096 tokens
+- Samples per epoch: 657
+- Number of epochs: 2
+- Learning rate: 1e-5
+- Warmup: 32 steps
+- LR Schedule: cosine
+- Batch size: 1
+- Gradient accumulation steps: 1
+The training takes ~150 minutes on NVIDIA GeForce RTX 4060 Ti.
+#### Results
+The fine-tuned model is able to understand the Kobold AI Adventure Format. It no longer attempts to generate the player's inputs starting with ">", and instead emits the EOS token, allowing the player to take turn.
+Without the context, the model tends to produce very short responses, 1-2 paragraphs at most. The non-player characters are passive and the model does not advance the narrative. This behavior is easily corrected by setting up the context in the instruct format:
+```
+### Instruction:
+Text transcript of a never-ending adventure story, written by the AI assistant. AI assistant uses vivid and evocative language to create a well-written novel. Characters are proactive and take initiative. Think about what goals the characters of the story have and write what they do to achieve those goals.
+### Input:
+<< transcript of the adventure + player's next turn >>
+Write a few paragraphs that advance the plot of the story.
+### Response:
+```
+(See instructions in [SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF](https://huggingface.co/SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF) for formatting the context in `koboldcpp`.)
+Setting or removing the instructions allows the model to generate accepted/rejected synthetic data samples for KTO. This data can then be used to further steer the model towards better storytelling in the Adventure Mode without the need for the specially-crafted context.
+#### Plots
+![Loss](img/sft_loss.png)
+![Gradient Norm](img/sft_grad_norm.png)
+![Learning rate](img/sft_learning_rate.png)
 ### Adventure mode KTO

img/da_grad_norm.png ADDED Viewed

img/da_learning_rate.png ADDED Viewed

img/da_loss.png ADDED Viewed

img/sft_grad_norm.png ADDED Viewed

img/sft_learning_rate.png ADDED Viewed

img/sft_loss.png ADDED Viewed

model-00001-of-00006.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:433625fd1da3ae8163cbb623be665f1b7f40252f8b7a6bbc0d77f74a88459c60
 size 4978265728

 version https://git-lfs.github.com/spec/v1
+oid sha256:899684315684bc1e86b9493697244d53984f7b6a4d5fb59f2548e6e2c6bdb4bc
 size 4978265728

model-00002-of-00006.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2468fd858ecf4ec3c290c93f02b20c32729022772cacad75350e339266d829aa
 size 4970422160

 version https://git-lfs.github.com/spec/v1
+oid sha256:7ed32a3422d944542c174f4e6309f4d73476d50f7156090dcbfbee0c9b2508ca
 size 4970422160

model-00003-of-00006.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ac2e69f709b2f8f55774095f524f6d2cca4d0842aaf8d76e202cfc263c428f38
 size 4970422184

 version https://git-lfs.github.com/spec/v1
+oid sha256:4f226876c9cf080978ea5741e79b14efc51e6e78ec1083defc6aa33355b12b88
 size 4970422184

model-00004-of-00006.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f12b3071b6006608aa1dac69ff465dbbb9d2ed1fb9989d02d77aece0e4402d14
 size 4933701432

 version https://git-lfs.github.com/spec/v1
+oid sha256:d618f98e38c59e56b0e0e547c315ecb14cefbde7aa5ef5592ba9c5baf7cc772e
 size 4933701432

model-00005-of-00006.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:50780887ba3c2602c8fc3ffb58c43c6144d42442bef75e71d33842daa0b117a9
 size 4933722144

 version https://git-lfs.github.com/spec/v1
+oid sha256:411c4db646c7108407a2452586852e0e3a3696acb0db1e16eeda3b826ae7f71a
 size 4933722144

model-00006-of-00006.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ca60917d349df6204c8e6e4f890beacd837d1ad786190b8cef9f5d3266556c27
 size 1245236904

 version https://git-lfs.github.com/spec/v1
+oid sha256:e0a650c32055713abaab4b59813d9a8d5503d11fc74023a526c4c15f180a7260
 size 1245236904