VLLM or SGLang?
#3
by
dipta007
- opened
Does the model support vllm or sglang?
vllm is supported
vLLM working using docker:
services:
vllm-openai:
image: vllm/vllm-openai:v0.8.5.post1
runtime: nvidia
ports:
- "8000:8000"
volumes:
- /opt/vllm/models/:/models/
environment:
- HF_HUB_OFFLINE=1
ipc: host
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: --model ModelSpace/GemmaX2-28-9B-v0.1 --task generate --served-model-name "GemmaX2" --gpu-memory-utilization 0.9 --cpu-offload-gb 56
Test the api using /docs (swagger) and /v1/chat/completions:
{"model":"GemmaX2","messages":[{"role":"user","content":"Translate this from Arabic to English: Arabic: أنا أحب الترجمة الآلية English:"}],"max_tokens":512}