|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
<div align="center"> |
|
|
|
# TinyMix-8x1b-Chat |
|
</div> |
|
|
|
This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit) |
|
|
|
The Goal was to MoE-fy the TinyLlama model and then use this as a base model to finetune from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself. |
|
|
|
More work coming! |
|
|
|
# Chat Template |
|
``` |
|
def make_prompt(instruction): |
|
return f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n" |
|
|
|
llm.generate(make_prompt('What is quantum tunneling?')) |
|
``` |
|
|
|
## Mergekit Config |
|
``` |
|
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
gate_mode: hidden |
|
dtype: bfloat16 |
|
experts: |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: [""] |
|
``` |