File size: 2,648 Bytes
387ea1b a23d49b 387ea1b a23d49b 387ea1b a23d49b 387ea1b a23d49b 387ea1b a44cc75 387ea1b a44cc75 387ea1b a44cc75 a23d49b a44cc75 387ea1b a44cc75 387ea1b a44cc75 387ea1b a44cc75 387ea1b a44cc75 387ea1b a44cc75 387ea1b a44cc75 387ea1b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
license: apache-2.0
tags:
- moe
- frankenmoe
- merge
- mergekit
- lazymergekit
- argilla/CapybaraHermes-2.5-Mistral-7B
- MediaTek-Research/Breeze-7B-Instruct-v0_1
base_model:
- argilla/CapybaraHermes-2.5-Mistral-7B
- MediaTek-Research/Breeze-7B-Instruct-v0_1
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6409720c9e9f790c905ba4bf/v6B0CkdpR74oCetV3w0y-.png)
# 試製-暮光-7B
試製-暮光-7B 是用[LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing)融合以下模型生成的:
* [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
* [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)
這是一個實驗模型,目的是爲了檢驗套用在不同語言上的高品質模型調教是否能夠轉移(此模型爲英文到中文)。
# shizhi-twilight-7B
shizhi-twilight-7B is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
* [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)
This is an experiment product on checking whether high quality fine-tuning on one language (English) could be transferred to another language (Mandarin) leveraging Slerp merge method.
## 🧩 Configuration
```yaml
slices:
- sources:
- model: MediaTek-Research/Breeze-7B-Instruct-v0_1
layer_range: [0, 32]
- model: argilla/CapybaraHermes-2.5-Mistral-7B
layer_range: [0, 32]
merge_method: slerp
base_model: MediaTek-Research/Breeze-7B-Instruct-v0_1
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
```
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "lipcut/shizhi-twilight-7B"
messages = [{"role": "user", "content": "什麼是大型語言模型?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
``` |