munish0838 commited on
Commit
4826e79
1 Parent(s): 08b539e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Locutusque/hercules-v5.0
5
+ base_model: M4-ai/Hercules-5.0-Qwen2-1.5B
6
+ language:
7
+ - en
8
+ inference:
9
+ parameters:
10
+ do_sample: true
11
+ temperature: 0.8
12
+ top_p: 0.95
13
+ top_k: 40
14
+ min_p: 0.1
15
+ max_new_tokens: 250
16
+ repetition_penalty: 1.1
17
+ ---
18
+
19
+
20
+ # Hercules-5.0-Qwen2-1.5B-GGUF
21
+ This is quantized version of [M4-ai/Hercules-5.0-Qwen2-1.5B](https://huggingface.co/M4-ai/Hercules-5.0-Qwen2-1.5B) created using llama.cpp
22
+
23
+
24
+ # Model Description
25
+
26
+ <!-- Provide a quick summary of what the model is/does. -->
27
+ We fine-tuned qwen2-1.5B on a high quality mix for general-purpose assistants. A DPO version of this will be released soon. We use the ChatML prompt format.
28
+
29
+
30
+ ## Model Details
31
+
32
+ <!-- Provide a longer summary of what this model is. -->
33
+
34
+ This model has capabilities in math, coding, writing, and more. We fine-tuned it using a high quality mix for general-purpose assistants.
35
+
36
+ - **Developed by:** M4-ai
37
+ - **Language(s) (NLP):** English and maybe Chinese
38
+ - **License:** apache-2.0
39
+ - **Finetuned from model:** [qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B)
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ General purpose assistant, question answering, chain-of-thought, etc..
46
+
47
+ This language model made an impressive achievement, and correctly implemented a Multi Head Attention for use in a transformer neural network.
48
+
49
+ ### Recommendations
50
+
51
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
52
+
53
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
54
+
55
+ ## Training Details
56
+
57
+ ### Training Data
58
+
59
+ - Locutusque/hercules-v5.0
60
+
61
+ ## Evaluations
62
+
63
+ coming soon
64
+
65
+ #### Training Hyperparameters
66
+
67
+ - **Training regime:** bf16 non-mixed precision
68
+ ## Technical Specifications
69
+
70
+ #### Hardware
71
+
72
+ We used 8 Kaggle TPUs, and we trained at a global batch size of 256 and sequence length of 1536.