eastwind
/

tinymix-8x1b-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tinymix-8x1b-chat / README.md

eastwind's picture

Update README.md

9f171fa 10 months ago

|

1.45 kB

	---
	license: apache-2.0
	language:
	- en
	---
	<div align="center">

	# TinyMix-8x1b-Chat
	</div>

	This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)

	The Goal was to MoE-fy the TinyLlama model and then use this as a base model to finetune from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.

	More work coming!

	# Chat Template
	```
	def make_prompt(instruction):
	return f"<\|im_start\|>user\n{instruction}<\|im_end\|>\n<\|im_start\|>assistant\n"

	llm.generate(make_prompt('What is quantum tunneling?'))
	```

	## Mergekit Config
	```
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	```