--- 本项目基于Meta发布的[llama3-8B模型](https://huggingface.co/meta-llama/Meta-Llama-3-8B)进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。 --- > 其中 router_warmboot表示使用chines-mixtral [https://github.com/ymcui/Chinese-Mixtral] (https://github.com/ymcui/Chinese-Mixtral)版本中的router参数进行llama3-MoE-base参数的初始化,router_random是router随机初始化的版本。 **详情请见github仓库[https://github.com/cooper12121/llama3-8x8b-MoE](https://github.com/cooper12121/llama3-8x8b-MoE)** **generate** ```python import sys sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/") from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base" tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt) # print(tokenizer) model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False) text_list = ["hello,what is your name?","你好,你叫什么名字"] tokenizer.pad_token = tokenizer.eos_token tokenizer.pad_token_id = tokenizer.eos_token_id inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda") output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100) print(tokenizer.batch_decode(output)) ``` **其中modeling_file文件可从github仓库获取**