Deployment on Amazon SageMaker Endpoint

#33
by dgallitelli - opened

Hello! Tried to deploy on SageMaker Endpoint, both with TGI and LMI, to no avail. Error: NotImplementedError: sharded is not supported for AutoModel. Any suggestions?

Images used:

  • LMI: 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124
  • TGI: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi3.0.1-gpu-py311-cu124-ubuntu22.04

Code used for deployment:

model = sagemaker.Model(
    image_uri=inference_image_uri,
    env={
        "HF_MODEL_ID": hf_model_id,
        "OPTION_MAX_MODEL_LEN": "10000",
        "OPTION_GPU_MEMORY_UTILIZATION": "0.95",
        "MAX_CONCURRENT_REQUESTS": "10", # Reduce concurrent requests to increase context length
        "SM_NUM_GPUS": json.dumps(number_of_gpu),
    },
    role=sagemaker.get_execution_role(),
    name=model_name,
    sagemaker_session=sagemaker.Session()
)

model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=600,
)
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment