LeroyDyer commited on
Commit
7b9cb3b
1 Parent(s): 369c461

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -8
README.md CHANGED
@@ -12,20 +12,78 @@ base_model:
12
  - ezelikman/quietstar-8-ahead
13
  ---
14
 
15
- hopefully this merge took correctly ! ....
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- Enabling for Thoughts to be displayed ;
18
- here i have addded the extra tokens to the tokenizer ;
19
 
20
- obviously untrained and will still need fine tuning !
21
- as well as it has not been correctly coded for true management via transformers pretrained args.
22
- i will try to add the other arch: leaving it available to perhaps load with different remote auto mapping! ,
23
- I will leve both automapping here and test both models to see which configuration loads correctly for training ! then wich loads correctly for usage ; as this also has been a minor issue ;
24
- the internall heads have default settings ; with remote code installed then its should be configuarble.
25
 
26
  # merge
27
 
28
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
 
 
29
 
30
  ## Merge Details
31
  ### Merge Method
 
12
  - ezelikman/quietstar-8-ahead
13
  ---
14
 
15
+ This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer).
16
+
17
+ IE: First Clone the latest transformers
18
+ enter the models\mistral folder and upload the modelling_mistral.py
19
+ then cd transformers and install frot he folder pip install ./transformers
20
+
21
+ after it can be loaded normally for training;
22
+
23
+ ```
24
+
25
+ from unsloth import FastLanguageModel
26
+ import torch
27
+ max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
28
+ dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
29
+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
30
+
31
+ # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
32
+ fourbit_models = [
33
+ "unsloth/mistral-7b-bnb-4bit",
34
+ "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
35
+ "unsloth/llama-2-7b-bnb-4bit",
36
+ "unsloth/llama-2-13b-bnb-4bit",
37
+ "unsloth/codellama-34b-bnb-4bit",
38
+ "unsloth/tinyllama-bnb-4bit",
39
+ "unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
40
+ "unsloth/gemma-2b-bnb-4bit",
41
+ ] # More models at https://huggingface.co/unsloth
42
+
43
+ model = FastLanguageModel.from_pretrained(
44
+ model_name = "LeroyDyer/Mixtral_AI_CyberBrain_3.0", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
45
+ max_seq_length = 2048,
46
+ dtype = dtype,
47
+ load_in_4bit = load_in_4bit,
48
+ # trust_remote_code = True,
49
+ ignore_mismatched_sizes = True,
50
+ merged_talk_heads=True,
51
+ merged_lm_and_talk_heads=False,
52
+ merged_lm_and_think_heads=True,
53
+ use_concat_talk_head=True,
54
+ use_shallow_think=True,
55
+ use_shallow_talk=False,
56
+ use_complex_think_head=False,
57
+ use_complex_talk_head=True,
58
+ use_weighted_talk_head=True,
59
+
60
+ # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
61
+ )
62
+
63
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_id,truncation=True,padding_side="right")
64
+ tokenizer.pad_token_id = tokenizer.eos_token_id
65
+
66
+
67
+
68
+ model.tokenizer = tokenizer
69
+
70
+ model.train
71
+
72
+
73
+ ```
74
+
75
+
76
+ right now the modelling_mistral.py s still havng problems loading remotely hence the hacky way... but after its fixed it will be fine.
77
 
 
 
78
 
 
 
 
 
 
79
 
80
  # merge
81
 
82
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
83
+ yes multiple verions of this model was merged in attempts to grab the neccasary tensors ...
84
+ but some how it did not build as some parameters was not loading. ie it would not load the config file! hopefully this will be rectified soon. so remote loading will be fine ... enabling for enhanced training.
85
+ the model was trained to perfection so it still works fine!
86
+ the lora was made so tat later it can be loaded with the model for further training of the effected tensors...
87
 
88
  ## Merge Details
89
  ### Merge Method