Model not loading and not printing any error message

#45
by robotrage - opened

i see no spike in RAM or GPU usage

code:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="H:/llm/Cache", offload_folder="H:/llm/Cache", device_map="auto")
print("re")

model = None
try:
model = AutoModelForCausalLM.from_pretrained(model_id,cache_dir="H:/llm/Cache")
except Exception as e:
print("err: " + str(e))

print("result")
text = "what is the ld50 of alcohol"
inputs = tokenizer(text, return_tensors="pt")
print("result!")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True) + "result!!!!!")

output:

H:\llm\MOE\mixtral\Mixtral-8x7B-Instruct-v0.1>python bot.py
re

H:\llm\MOE\mixtral\Mixtral-8x7B-Instruct-v0.1>

Would try to load a smaller version of the model with model = AutoModelForCausalLM.from_pretrained(model_id,cache_dir="H:/llm/Cache", torch_dtype = torch.float16) (unless you don't have a GPU). What architecture are you using?

Hi @robotrage
there is no need to call device_map="auto" on the tokenizer call, can you call it instead in the automodel call ? Also please consider passing low_cpu_mem_usage=True

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="H:/llm/Cache", offload_folder="H:/llm/Cache")
print("re")

model = None
try:
    model = AutoModelForCausalLM.from_pretrained(model_id,cache_dir="H:/llm/Cache",  device_map="auto", low_cpu_mem_usage=True)
except Exception as e:
    print("err: " + str(e))

Sign up or log in to comment