Deploying a fine-tuned model with custom inference code
I tried to deploy a fine-tuned mistral-7B-instruct-v0.3 to sagemaker with my custom inference code. The deployment is going through without any error or warning, but it completely ignores my inference code! I don't see anything in the Cloudwatch log installing packages from "requirements.txt" and loading the model from model_fn!
I can send a regular prompt to the endpoint and get the response, but nothing from my inference code. I checked the structure of my model.tar.gz and the code folder is there and has the entry_point and requirements.txt.
What could be the issue here? Am I missing something?
import json
from sagemaker.huggingface import HuggingFaceModel
# s3 path where the model will be uploaded
# if you try to deploy the model to a different time add the s3 path here
model_s3_path = "s3://my-bucket/mistral-fine-tuned-custom-2024-06-28/model.tar.gz"
image_uri = "huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.3-gpu-py310-cu121-ubuntu22.04-v2.0"
# sagemaker config
instance_type = "ml.g5.24xlarge"
number_of_gpu = 4
health_check_timeout = 900
# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID': "/opt/ml/model", # path to where sagemaker stores the model
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(24000), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(30000), # Max length of the generation (including input text)
'MAX_BATCH_TOTAL_TOKENS': json.dumps(30001),
'MAX_BATCH_PREFILL_TOKENS': json.dumps(30000)
}
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
image_uri=image_uri,
model_data=model_s3_path,
entry_point="finetuned_model_entrypoint.py",
env=config
)
llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
endpoint_name="mistral-fine-tuned-stage"
)
@philschmid Appreciate your help here.
If you want to use a requirements.txt or inference.py you need to use the regular container and not TGI.
oh I see. I suspected this is not supported but since I saw the entry_point parameter in HuggingFaceModel (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/huggingface/model.py#L119) , I thought this should be doable. Thanks a lot for your prompt reply.
I actually tried to deploy this as a PyTorchModel with a regular container, but there I got the model load failure without more context in the logs. I changed the version of torch and transformers, but didn't work. I thought I could deploy with TGI. Appreciate if you refer me to any example for the same case.