Fail to run on vLLM
Hi
Trying to run this model on vllm, first issue is it can't find the tokenizer. So I added '--tokenizer=mistralai/Mistral-7B-Instruct-v0.3'.
Then the error is: 'No model.safetensors.index.json found in remote.'.
I assume this is a configuration issue.
Appreciate your assistance.
Below are my vllm params:
--port=8000
--model=cimphony-ai-admin/Cimphony-Mistral-Law-7B
--tokenizer-mode=mistral
--tokenizer=mistralai/Mistral-7B-Instruct-v0.3
--trust-remote-code
--gpu-memory-utilization=0.9",
Hi
Trying to run this model on vllm, first issue is it can't find the tokenizer. So I added '--tokenizer=mistralai/Mistral-7B-Instruct-v0.3'.
Then the error is: 'No model.safetensors.index.json found in remote.'.
I assume this is a configuration issue.
Appreciate your assistance.
Below are my vllm params:--port=8000 --model=cimphony-ai-admin/Cimphony-Mistral-Law-7B --tokenizer-mode=mistral --tokenizer=mistralai/Mistral-7B-Instruct-v0.3 --trust-remote-code --gpu-memory-utilization=0.9",
This repo contains only the LoRA adapter, where the base model is mistral-v0.1. You can use it directly with PEFT library, or with vLLM using these instructions: https://docs.vllm.ai/en/latest/models/lora.html
Got it.
I've followed the instructions, and now I'm getting another error:
OSError: Found 0 files matching the pattern: {matched_files}. Make sure that a Mistral tokenizer is present in {tokenizer_name}.
Running with params:
model='mistralai/Mistral-7B-v0.1', speculative_config=None, tokenizer='mistralai/Mistral-7B-v0.1', skip_tokenizer_init=False, tokenizer_mode=mistral, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=mistralai/Mistral-7B-v0.1, use_v2_block_manager=False, num_scheduler_steps=1, multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=True, mm_processor_kwargs=None)
BTW I get this error whether I specify a tokenizer or not.
I've tried specifying both the mistral v01 tokenizer, or cimphony-ai-admin/Cimphony-Mistral-Law-7B.
I'm assuming this is a configuration issue?
@iarbel
Could you please take a look at the error above?
I would love to run some benchmarks on this model.
Here's how I run vLLM now:
--model=mistralai/Mistral-7B-v0.1
--tokenizer-mode=mistral
--tokenizer=cimphony-ai-admin/Cimphony-Mistral-Law-7B
--enable-lora
--lora-modules='{"name": "cimphony-mistral-law-7b", "path": "cimphony-ai-admin/Cimphony-Mistral-Law-7B", "base_model_name": "mistralai/Mistral-7B-v0.1"}'
--trust-remote-code
@iarbel Could you please take a look at the error above?
I would love to run some benchmarks on this model.
Here's how I run vLLM now:--model=mistralai/Mistral-7B-v0.1 --tokenizer-mode=mistral --tokenizer=cimphony-ai-admin/Cimphony-Mistral-Law-7B --enable-lora --lora-modules='{"name": "cimphony-mistral-law-7b", "path": "cimphony-ai-admin/Cimphony-Mistral-Law-7B", "base_model_name": "mistralai/Mistral-7B-v0.1"}' --trust-remote-code
I'm not sure what the problem is here. Are you able to load Mistral-v0.1 with some other, arbitrary LORA adapter?
Also, you can try loading it directly through the Transformers / PEFT library