File size: 4,468 Bytes
459e27e a4faa6b 459e27e a4faa6b 459e27e 27773a5 b0aab04 7b9cb3b 27773a5 459e27e 7b9cb3b 459e27e 27773a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
language:
- en
license: mit
library_name: transformers
tags:
- mergekit
- merge
- unsloth
base_model:
- LeroyDyer/Mixtral_AI_CyberBrain_2.0
- ezelikman/quietstar-8-ahead
---
ActulLLY ITS woRKING IT JUST NEEDS TRAINING DATA!! ....
This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer).
IE: First Clone the latest transformers
enter the models\mistral folder and upload the modelling_mistral.py
then cd transformers and install frot he folder pip install ./transformers
after it can be loaded normally for training;
```
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-bnb-4bit",
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
"unsloth/llama-2-7b-bnb-4bit",
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
"unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model = FastLanguageModel.from_pretrained(
model_name = "LeroyDyer/Mixtral_AI_CyberBrain_3.0", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
max_seq_length = 2048,
dtype = dtype,
load_in_4bit = load_in_4bit,
# trust_remote_code = True,
ignore_mismatched_sizes = True,
merged_talk_heads=True,
merged_lm_and_talk_heads=False,
merged_lm_and_think_heads=True,
use_concat_talk_head=True,
use_shallow_think=True,
use_shallow_talk=False,
use_complex_think_head=False,
use_complex_talk_head=True,
use_weighted_talk_head=True,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id,truncation=True,padding_side="right")
tokenizer.pad_token_id = tokenizer.eos_token_id
model.tokenizer = tokenizer
model.train
```
right now the modelling_mistral.py s still havng problems loading remotely hence the hacky way... but after its fixed it will be fine.
# merge
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
yes multiple verions of this model was merged in attempts to grab the neccasary tensors ...
but some how it did not build as some parameters was not loading. ie it would not load the config file! hopefully this will be rectified soon. so remote loading will be fine ... enabling for enhanced training.
the model was trained to perfection so it still works fine!
the lora was made so tat later it can be loaded with the model for further training of the effected tensors...
## Merge Details
### Merge Method
This model was merged using the SLERP merge method.
### Models Merged
The following models were included in the merge:
* [LeroyDyer/Mixtral_AI_CyberBrain_2.0](https://huggingface.co/LeroyDyer/Mixtral_AI_CyberBrain_2.0)
* [ezelikman/quietstar-8-ahead](https://huggingface.co/ezelikman/quietstar-8-ahead)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
slices:
- sources:
- model: LeroyDyer/Mixtral_AI_CyberBrain_2.0
layer_range: [0, 32]
- model: ezelikman/quietstar-8-ahead
layer_range: [0, 32]
# or, the equivalent models: syntax:
# models:
# - model: mistralai/Mistral-7B-Instruct-v0.2
# LaRGER MODEL MUST BE BASE or
# BASE MODEL MUST BE THE TOKENIZER YOU WISH TO ADOPT
# so for models with customized processes they must be the base model
# If the base model has remote code then this must be collected and added
# to the repo after and the config file adusted to allow for automapping to your new repo
# - model: yanismiraoui/Yarn-Mistral-7b-128k-sharded
merge_method: slerp
base_model: ezelikman/quietstar-8-ahead
parameters:
t:
- filter: self_attn
value: [0.3, 0.6, 0.3786, 0.6, 0.6]
- filter: mlp
value: [0.7, 0.4, 0.6, 0.4, 0.7]
- value: 0.5 # fallback for rest of tensors
dtype: float16
``` |