System requirements?
What are the system requirements to run Mistral-Nemo on-device? I received a CUDA out of memory error.
mistral-chat $HOME/mistral_models/Nemo-Instruct --instruct --max_tokens 256 --temperature 0.35
Traceback (most recent call last):
File "/home/nacho/.local/bin/mistral-chat", line 8, in
sys.exit(mistral_chat())
File "/home/nacho/.local/lib/python3.10/site-packages/mistral_inference/main.py", line 203, in mistral_chat
fire.Fire(interactive)
File "/home/nacho/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/nacho/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/nacho/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/nacho/.local/lib/python3.10/site-packages/mistral_inference/main.py", line 83, in interactive
model = model_cls.from_folder(Path(model_path), max_batch_size=3, num_pipeline_ranks=num_pipeline_ranks)
File "/home/nacho/.local/lib/python3.10/site-packages/mistral_inference/transformer.py", line 367, in from_folder
return model.to(device=device, dtype=dtype)
File "/home/nacho/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
return self._apply(convert)
File "/home/nacho/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/home/nacho/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/home/nacho/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/nacho/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
File "/home/nacho/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
return t.to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 140.00 MiB. GPU
At least 12 GB of VRAM for this version with 16k ctx, try using a 4.65 bpw quant instead, there are many exl2 out there.