Deployment on Amazon SageMaker Endpoint
#33
by
dgallitelli
- opened
Hello! Tried to deploy on SageMaker Endpoint, both with TGI and LMI, to no avail. Error: NotImplementedError: sharded is not supported for AutoModel
. Any suggestions?
Images used:
- LMI:
763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124
- TGI:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi3.0.1-gpu-py311-cu124-ubuntu22.04
Code used for deployment:
model = sagemaker.Model(
image_uri=inference_image_uri,
env={
"HF_MODEL_ID": hf_model_id,
"OPTION_MAX_MODEL_LEN": "10000",
"OPTION_GPU_MEMORY_UTILIZATION": "0.95",
"MAX_CONCURRENT_REQUESTS": "10", # Reduce concurrent requests to increase context length
"SM_NUM_GPUS": json.dumps(number_of_gpu),
},
role=sagemaker.get_execution_role(),
name=model_name,
sagemaker_session=sagemaker.Session()
)
model.deploy(
endpoint_name=endpoint_name,
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=600,
)