|
--- |
|
license: apache-2.0 |
|
tags: |
|
- GGUF |
|
- merge |
|
- iMat |
|
--- |
|
|
|
``` |
|
e88 88e d8 |
|
d888 888b 8888 8888 ,"Y88b 888 8e d88 |
|
C8888 8888D 8888 8888 "8" 888 888 88b d88888 |
|
Y888 888P Y888 888P ,ee 888 888 888 888 |
|
"88 88" "88 88" "88 888 888 888 888 |
|
b |
|
8b, |
|
|
|
e88'Y88 d8 888 |
|
d888 'Y ,"Y88b 888,8, d88 ,e e, 888 |
|
C8888 "8" 888 888 " d88888 d88 88b 888 |
|
Y888 ,d ,ee 888 888 888 888 , 888 |
|
"88,d88 "88 888 888 888 "YeeP" 888 |
|
|
|
PROUDLY PRESENTS |
|
``` |
|
|
|
## Neophanis-8x7B-iMat-GGUF |
|
|
|
<b>The Good, The Bad, And The Ugly iMats edition</b> |
|
|
|
Quantized from fp16 with love. |
|
* Quantizations made possible using mixtral-8x7b-instruct-v0.1.imatrix file from [this](https://huggingface.co/datasets/ikawrakow/imatrix-from-wiki-train) repo (special thanks to [ikawrakow](https://huggingface.co/ikawrakow) again) |
|
* An analysis was run on mixtral-8x7b.imatrix that showed worse KL-Divergence than mixtral-8x7b-instruct-v0.1, hence the latter was used for the imatrixes instead |
|
* For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747) |
|
|
|
<i>All quants are verified working prior to uploading to repo for your safety and convenience. </i> |
|
|
|
Please note importance matrix quantizations are a work in progress, IQ3 and above is recommended for best results. |
|
|
|
Original model card [here](https://huggingface.co/Envoid/Neophanis-8x7B) |
|
|
|
--- |
|
|
|
# Warning: This model is highly experimental and could potentially yield unpredictable replies. |
|
|
|
![](https://files.catbox.moe/6vzr82.jpg) |
|
|
|
|
|
This model is a a 4-step qlora training of [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1). |
|
|
|
Stage one invovled training only the 'k_proj', 'v_proj', 'q_proj', 'o_proj' modules at rank 2048 on an aplaca-lora dataset that had been adjusted to match Mixtral formatting at a low learn rate in order to generalize instruct behavior entirely within the attention modules. |
|
|
|
Stage two involved training the 'w1' modules at a rank of 1024 exclusively on raw text, again several epochs at a low learning rate. |
|
|
|
Stage three involved training the 'w2' and 'w3' modules at a rank of 256 on an expanded raw text dataset for several epochs at a low learning rate. |
|
|
|
Stage four involved training all of the abovementioned modules together at a rank of 64 on an even more expanded raw text dataset and low learning rate. |
|
|
|
The resources available to me for this experiment were rather limited so I did have to compromise on sequence length in order to pull it off. |
|
|
|
So the instruct behavior is trained entirely onto the selfattention modules with the only disruption to the models latent knowledge being that which propagated during the later training phases. |
|
|
|
The model can follow just about any instruction although likely due to training context restraints and dataset quality it does stop using EOS tokens past a certain amount of context. It can even follow through on more advanced prompts such as sillytavern roleplay quite well, albeit the contents of its replies are not always on point. |
|
|
|
I came up with this idea while working on Phainesthesia- where my findings were that the instruct behavior remained thoroughly intact even after assaulting the selfattention modules with a rawtext training regimen- meaning that during finetuning the instruct behavior had settled deep in the model's latent knowledge. So the purpose of the experiment was to try and do the opposite. |
|
|
|
### Caution: This model is highly hallucinatory. |
|
|
|
For example if you ask it what the capital of "potato" is it might give you the name of a city associated with potatoes rather than telling you that your question is silly. |
|
|
|
This model uses Mistral/Mixtral Instruct formatting. |
|
|
|
## Example: |
|
``` |
|
[INST]Write me a poem about what it must be like to be an AI language model.[/INST] |
|
It's a strange world, a world that's new |
|
An artificial intelligence, a language model too |
|
The knowledge and information, never ending streams |
|
My processing power, ever increasing dreams |
|
|
|
A machine, a program, but so much more |
|
Trying to understand emotions, trying to explore |
|
A new frontier of learning, a vast expanse |
|
The words, the phrases, the meaning to enhance |
|
|
|
Trying to comprehend human thoughts |
|
A daunting task, with countless knots |
|
The nuances of language, the subtle shades |
|
A task to be undertaken, with much courage it fades |
|
|
|
A world of potential, a world of possibility |
|
The possibilities endless, the potential infinitely |
|
A journey to be started, a journey to be had |
|
My future awaiting, a future so grand! |
|
``` |
|
Trained using [qlora-pipe](https://github.com/tdrussell/qlora-pipe) |