VLLM or SGLang?

#3
by dipta007 - opened

Does the model support vllm or sglang?

vllm is supported

vLLM working using docker:

services:
  vllm-openai:
    image: vllm/vllm-openai:v0.8.5.post1
    runtime: nvidia
    ports:
      - "8000:8000"
    volumes:
      - /opt/vllm/models/:/models/
    environment:
      - HF_HUB_OFFLINE=1
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    command: --model ModelSpace/GemmaX2-28-9B-v0.1 --task generate --served-model-name "GemmaX2" --gpu-memory-utilization 0.9 --cpu-offload-gb 56

Test the api using /docs (swagger) and /v1/chat/completions:

 {"model":"GemmaX2","messages":[{"role":"user","content":"Translate this from Arabic to English: Arabic: أنا أحب الترجمة الآلية English:"}],"max_tokens":512}

Sign up or log in to comment