Instructions to use bigcode/starcoder2-15b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/starcoder2-15b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigcode/starcoder2-15b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder2-15b") model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigcode/starcoder2-15b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigcode/starcoder2-15b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigcode/starcoder2-15b
- SGLang
How to use bigcode/starcoder2-15b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigcode/starcoder2-15b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigcode/starcoder2-15b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigcode/starcoder2-15b with Docker Model Runner:
docker model run hf.co/bigcode/starcoder2-15b
Deployment issue in AWS Sagemaker and GCP
Hi Team,
Tried deploying the Starcoder2-15B model both in AWS Sagemaker and GCP.
In both the platforms, the deployment is failing with the below error:
raise ValueError(\nValueError: The checkpoint you are trying to load has model type starcoder2 but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
For Sagemaker, followed the same step mentioned in the Deployment tab.
get_huggingface_llm_image_uri('huggingface',
version="1.4.2", session=sess)
'model_id' : 'bigcode/starcoder2-15b',
'instance_type' : 'ml.g5.2xlarge',
'num_gpus' : '1',
hp = hyperparameters(config)
create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_model_image,
env=hp
)
estimator = llm_model.deploy(
initial_instance_count=1,
instance_type=config['instance_type'],
endpoint_name=config['endpoint_name'],
container_startup_health_check_timeout=600, # 10 minutes to be able to load the model
)
In a stand alone notebook, I am able to download the model using Transformer 4.39.2 version.
Could you please help in deploying this model in Sagemaker/ GCP.