Model not loading and not printing any error message
i see no spike in RAM or GPU usage
code:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="H:/llm/Cache", offload_folder="H:/llm/Cache", device_map="auto")
print("re")
model = None
try:
model = AutoModelForCausalLM.from_pretrained(model_id,cache_dir="H:/llm/Cache")
except Exception as e:
print("err: " + str(e))
print("result")
text = "what is the ld50 of alcohol"
inputs = tokenizer(text, return_tensors="pt")
print("result!")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True) + "result!!!!!")
output:
H:\llm\MOE\mixtral\Mixtral-8x7B-Instruct-v0.1>python bot.py
re
H:\llm\MOE\mixtral\Mixtral-8x7B-Instruct-v0.1>
Would try to load a smaller version of the model with model = AutoModelForCausalLM.from_pretrained(model_id,cache_dir="H:/llm/Cache", torch_dtype = torch.float16)
(unless you don't have a GPU). What architecture are you using?
Hi
@robotrage
there is no need to call device_map="auto"
on the tokenizer call, can you call it instead in the automodel call ? Also please consider passing low_cpu_mem_usage=True
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="H:/llm/Cache", offload_folder="H:/llm/Cache")
print("re")
model = None
try:
model = AutoModelForCausalLM.from_pretrained(model_id,cache_dir="H:/llm/Cache", device_map="auto", low_cpu_mem_usage=True)
except Exception as e:
print("err: " + str(e))