hywu commited on
Commit
fd8d582
β€’
1 Parent(s): 6019749

update README

Browse files
Files changed (1) hide show
  1. README.md +34 -28
README.md CHANGED
@@ -12,60 +12,66 @@ arxiv: 2401.02731
12
  license: apache-2.0
13
  ---
14
 
 
15
  # Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
16
 
17
  ## News
18
- - 1/10/2024 - Camelidae models are now available on [πŸ€—HuggingFace](https://huggingface.co/hywu).
 
 
19
  - 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
20
- - 12/22/2023 - We released the training [repo](https://github.com/wuhy68/Parameter-Efficient-MoE) that craft the dense model with LLaMA architecture to the MoE model.
21
-
22
  ## Introduction
23
- Camelidae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
24
 
25
- Parameter-Efficient Sparsity Crafting can help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.
26
 
27
- Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including [QLoRA](https://arxiv.org/abs/2305.14314) and [Adapter](https://arxiv.org/abs/1902.00751) to perfrom Efficient [Sparse Upcycling](https://arxiv.org/abs/2212.05055).
28
 
29
  ## Model Lists
30
- | Model | Download
31
  |---|---
32
- Camelidae-8x7B | [πŸ€—HuggingFace](https://huggingface.co/hywu/Camelidae-8x7B)
33
- Camelidae-8x13B | [πŸ€—HuggingFace](https://huggingface.co/hywu/Camelidae-8x13B)
34
- Camelidae-8x34B | [πŸ€—HuggingFace](https://huggingface.co/hywu/Camelidae-8x34B)
 
 
 
 
 
 
 
35
 
36
  ## Performance
37
- | Model | MMLU (5shot) | GSM8k (5shot) | MATH (4shot) | HumanEval (0shot) | MBPP (4shot) | HellaSwag (10shot) | TriviaQA (0shot) |
38
- |----------------------:|:------------:|:-------------:|:------------:|:-----------------:|:------------:|:------------------:|:----------------:|
39
- | GPT3.5 | 70.0% | 57.1% | **34.1%** | **48.1%** | - | 85.5% | - |
40
- | Camelidae-8x34B | 75.6% | **78.3%** | **22.6%** | **43.9%** | **41.4%** | 85.3% | **63.4%** |
41
- | SUSChat-34B | **76.4%** | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% | 56.1% |
42
- | Mixtral-8x7B-instruct | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | **86.5%** | 57.7% |
43
- | LLaMA2-70B-chat | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% | 63.0% |
44
- | Camelidae-8x13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% | 59.4% |
45
- | LLaMA2-13B-chat | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% | 55.0% |
46
- | Camelidae-8x7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% | 51.0% |
47
- | LLaMA2-7B-chat | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% | 46.4% |
 
 
 
48
 
49
- We bold the highest scores for open-source models and all models separately.
50
 
51
 
52
  ## Usage
53
  ```python
54
  from transformers import AutoModelForCausalLM, AutoTokenizer
55
 
56
- # tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x7B", trust_remote_code=True)
57
- # tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x13B", trust_remote_code=True)
58
  tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)
59
-
60
- # model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x7B", device_map="auto", trust_remote_code=True).eval()
61
- # model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x13B", device_map="auto", trust_remote_code=True).eval()
62
  model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()
63
 
64
  inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
65
  inputs = inputs.to(model.device)
66
  pred = model.generate(**inputs)
67
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
68
- # I am doing well, thank you.
69
  ```
70
 
71
  ## Citation
 
12
  license: apache-2.0
13
  ---
14
 
15
+
16
  # Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
17
 
18
  ## News
19
+ - 3/12/2024 - We released Qwen2idae-16x14B-v1.0 on πŸ€— [HuggingFace](https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0), which has strong performance in Math and Code with 15B activated params.
20
+ - 2/7/2024 - [Serp-ai](https://github.com/serp-ai/Parameter-Efficient-MoE) adds [unsloth](https://github.com/serp-ai/unsloth) support for faster and memory efficient training of our Parameter-Efficient Sparsity Crafting and releases new [sparsetral](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) models based on mistral-7B.
21
+ - 1/10/2024 - Camelidae models are now available on πŸ€— [HuggingFace](https://huggingface.co/hywu).
22
  - 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
23
+ - 12/22/2023 - We released the training repo that craft the dense model with LLaMA architecture to the MoE model.
 
24
  ## Introduction
25
+ Camelidae and Qwen2idae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
26
 
27
+ We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This approach performs instruction tuning and efficiently utilizes MoE structure.
28
 
29
+ Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter-efficient techniques including [QLoRA](https://arxiv.org/abs/2305.14314) and [Adapter](https://arxiv.org/abs/1902.00751) to perform Efficient [Sparse Upcycling](https://arxiv.org/abs/2212.05055).
30
 
31
  ## Model Lists
32
+ | Camelidae Series | Download
33
  |---|---
34
+ Camelidae-8x7B | πŸ€— [HuggingFace](https://huggingface.co/hywu/Camelidae-8x7B)
35
+ Camelidae-8x13B | πŸ€— [HuggingFace](https://huggingface.co/hywu/Camelidae-8x13B)
36
+ Camelidae-8x34B | πŸ€— [HuggingFace](https://huggingface.co/hywu/Camelidae-8x34B)
37
+ Camelidae-8x34B-pro | πŸ€— Coming Soon
38
+
39
+ | Qwen2idae Series | Download
40
+ |---|---
41
+ Qwen2idae-16x14B-v1.0 | πŸ€— [HuggingFace](https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0)
42
+ Qwen2idae-16x7B-v1.0 | πŸ€— Coming Soon
43
+ Qwen2idae-16x1.8B-v1.0 | πŸ€— Coming Soon
44
 
45
  ## Performance
46
+ | Model | Activated Params | MMLU (5shot) | GSM8k (5shot) | MATH (4shot) | HumanEval (0shot) | MBPP (4shot) | HellaSwag (10shot) |
47
+ |:-----:|:----------------:|:------------:|:-------------:|:------------:|:-----------------:|:------------:|:------------------:|
48
+ | GPT3.5 | - | 70.0% | 57.1% | <font color=#F67F70>**34.1%**</font> | <font color=#FBD98D>**48.1%**</font> | - | <font color=#7FEA9E>**85.5%**</font> |
49
+ | LLaMA2-70B-chat | 70B | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% |
50
+ | Camelidae-8x34B-pro | 35B | <font color=#7FEA9E>**75.7%**</font> | <font color=#F67F70>**79.4%**</font> | <font color=#FBD98D>**24.0%**</font> | <font color=#7FEA9E>**48.8%**</font> | <font color=#7FEA9E>**43.2%**</font> | 85.2% |
51
+ | Camelidae-8x34B | 35B | <font color=#FBD98D>**75.6%**</font> | <font color=#7FEA9E>**78.3%**</font> | 22.6% | 43.9% | <font color=#FBD98D>**41.4%**</font> | <font color=#FBD98D>**85.3%**</font> |
52
+ | SUSChat-34B | 34B | <font color=#F67F70>**76.4%**</font> | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% |
53
+ | Yi-34B-chat | 34B | 74.8% | 67.6% | 17.3% | 20.1% | 41.0% | 83.9% |
54
+ | Qwen2idae-16x14B-v1.0 | 15B | 66.7% | <font color=#FBD98D>**77.8%**</font> | <font color=#7FEA9E>**29.9%**</font> | <font color=#F67F70>**62.8%**</font> | <font color=#F67F70>**48.6%**</font> | 82.3% |
55
+ | Mixtral-8x7B-instruct | 14B | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | <font color=#F67F70>**86.5%**</font> |
56
+ | Camelidae-8x13B | 13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% |
57
+ | LLaMA2-13B-chat | 13B | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% |
58
+ | Camelidae-8x7B | 7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% |
59
+ | LLaMA2-7B-chat | 7B | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% |
60
 
61
+ We bold the top3 scores separately for all models.
62
 
63
 
64
  ## Usage
65
  ```python
66
  from transformers import AutoModelForCausalLM, AutoTokenizer
67
 
 
 
68
  tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)
 
 
 
69
  model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()
70
 
71
  inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
72
  inputs = inputs.to(model.device)
73
  pred = model.generate(**inputs)
74
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 
75
  ```
76
 
77
  ## Citation