The operation failed due to 'weight lm_head.weight does not exist
#3
by
prashil02
- opened
While deploying this model, I see the following error:
2024-03-28T06:02:10.580413Z INFO text_generation_launcher: Using configured max_sequence_length: 2048
2024-03-28T06:02:10.580423Z INFO text_generation_launcher: Setting PYTORCH_CUDA_ALLOC_CONF to default value: expandable_segments:True
2024-03-28T06:02:10.580796Z INFO text_generation_launcher: Starting shard 0
Shard 0: There was a problem when trying to write in your cache folder (/home/tgis/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
Shard 0: HAS_BITS_AND_BYTES=False, HAS_EXLLAMA=True, EXLLAMA_VERSION=2
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
Shard 0: Traceback (most recent call last):
Shard 0:
Shard 0: File "/opt/tgis/bin/text-generation-server", line 8, in <module>
Shard 0: sys.exit(app())
Shard 0: ^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/cli.py", line 75, in serve
Shard 0: raise e
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/cli.py", line 56, in serve
Shard 0: server.serve(
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/server.py", line 389, in serve
Shard 0: asyncio.run(
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/asyncio/runners.py", line 190, in run
Shard 0: return runner.run(main)
Shard 0: ^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/asyncio/runners.py", line 118, in run
Shard 0: return self._loop.run_until_complete(task)
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
Shard 0: return future.result()
Shard 0: ^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/server.py", line 267, in serve_inner
Shard 0: model = get_model(
Shard 0: ^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/models/__init__.py", line 129, in get_model
Shard 0: return Seq2SeqLM(model_name, revision, deployment_framework, dtype, quantize, model_config, max_sequence_length)
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/models/seq2seq_lm.py", line 557, in __init__
Shard 0: inference_engine = get_inference_engine_class(deployment_framework)(
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/inference_engine/tgis_native.py", line 117, in __init__
Shard 0: model = model_class(self._config, weights)
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1043, in __init__
Shard 0: self.lm_head = TensorParallelHead.load(
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/utils/layers.py", line 219, in load
Shard 0: weight = weights.get_tensor(f"{prefix}.weight")
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/utils/weights.py", line 69, in get_tensor
Shard 0: filename, tensor_name = self.get_filename(tensor_name)
Shard 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shard 0:
Shard 0: File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/utils/weights.py", line 56, in get_filename
Shard 0: raise RuntimeError(f"weight {tensor_name} does not exist")
Shard 0:
Shard 0: RuntimeError: weight lm_head.weight does not exist
Shard 0:
2024-03-28T06:02:14.967668Z ERROR text_generation_launcher: Shard 0 failed: ExitStatus(unix_wait_status(256))
2024-03-28T06:02:15.067414Z INFO text_generation_launcher: Shutting down shards