MoE
Collection
Mixture of experts (MoE) Models
•
19 items
•
Updated
•
3
A fine-tuned version of v2ray/Mixtral-8x22B-v0.1 model on the following datasets:
This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an 8192 sequence length
. This results in the model being able to generate longer and more coherent responses.
Use a pipeline as a high-level helper:
from transformers import pipeline
pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2")
Load model directly:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
Base model
v2ray/Mixtral-8x22B-v0.1