File size: 2,648 Bytes

387ea1b
a23d49b
387ea1b
a23d49b
 
387ea1b
 
 
 
a23d49b
387ea1b
 
a23d49b
387ea1b
 
a44cc75
 
 
 
387ea1b
a44cc75
 
387ea1b
a44cc75
 
 
 
 
 
 
a23d49b
a44cc75
 
 
387ea1b
 
 
 
a44cc75
 
 
 
 
 
 
387ea1b
 
a44cc75
 
 
 
 
 
387ea1b
 
 
 
 
 
a44cc75
387ea1b
 
 
 
 
 
a44cc75
387ea1b
 
a44cc75
387ea1b
 
 
a44cc75
 
387ea1b

---
license: apache-2.0
tags:
- moe
- frankenmoe
- merge
- mergekit
- lazymergekit
- argilla/CapybaraHermes-2.5-Mistral-7B
- MediaTek-Research/Breeze-7B-Instruct-v0_1
base_model:
- argilla/CapybaraHermes-2.5-Mistral-7B
- MediaTek-Research/Breeze-7B-Instruct-v0_1
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6409720c9e9f790c905ba4bf/v6B0CkdpR74oCetV3w0y-.png)


# 試製-暮光-7B

試製-暮光-7B　是用[LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing)融合以下模型生成的：
* [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
* [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)

這是一個實驗模型，目的是爲了檢驗套用在不同語言上的高品質模型調教是否能夠轉移（此模型爲英文到中文）。


# shizhi-twilight-7B

shizhi-twilight-7B is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)
* [argilla/CapybaraHermes-2.5-Mistral-7B](https://huggingface.co/argilla/CapybaraHermes-2.5-Mistral-7B)

This is an experiment product on checking whether high quality fine-tuning on one language (English) could be transferred to another language (Mandarin) leveraging Slerp merge method.

## 🧩 Configuration

```yaml
slices:
  - sources:
      - model: MediaTek-Research/Breeze-7B-Instruct-v0_1
        layer_range: [0, 32]
      - model: argilla/CapybaraHermes-2.5-Mistral-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: MediaTek-Research/Breeze-7B-Instruct-v0_1
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
```

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "lipcut/shizhi-twilight-7B"
messages = [{"role": "user", "content": "什麼是大型語言模型?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```