Instructions to use bigscience/bloomz-petals with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigscience/bloomz-petals with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigscience/bloomz-petals")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-petals") model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-petals") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigscience/bloomz-petals with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigscience/bloomz-petals" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloomz-petals", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigscience/bloomz-petals
- SGLang
How to use bigscience/bloomz-petals with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigscience/bloomz-petals" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloomz-petals", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigscience/bloomz-petals" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloomz-petals", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigscience/bloomz-petals with Docker Model Runner:
docker model run hf.co/bigscience/bloomz-petals
MissingBlocksError
Hello! I'm currently running bloomz-petals on a Google Colab notebook in order to make use of the free GPU. However, I've recently started receiving the following error:
MissingBlocksError: No servers holding blocks 0 are online. You can check the public swarm's state at
http://health.petals.ml If there are not enough servers, please connect your GPU:
https://github.com/bigscience-workshop/petals#connect-your-gpu-and-increase-petals-capacity
The public swarm's state seems okay and I can sometimes get everything to work (it's a bit of a dice roll). If anyone could please explain this error and give ideas to solve it, that would be extremely helpful! Thank you!
Hi! Sorry for the delayed response. I also remember some re-balancing issues. TBH i'm not 100% sure what happened either, but my best guess is that several peers exited simultaneously and this resulted in temporary loss of connectivity.
Is it any better this week?
Hello and thanks for getting back to me! This issue has been persisting for the past week (maybe getting worse? I'd usually be able to run the model after retrying ~10 times). Any advice would be greatly appreciated!
Did you resolve this error? I'm also getting it.