ModelSpace/GemmaX2-28-9B-v0.1

dipta007

Mar 2

Does the model support vllm or sglang?

pigggg

Mar 3

vllm is supported

UserName4Ever

Jun 4

vLLM working using docker:

services:
  vllm-openai:
    image: vllm/vllm-openai:v0.8.5.post1
    runtime: nvidia
    ports:
      - "8000:8000"
    volumes:
      - /opt/vllm/models/:/models/
    environment:
      - HF_HUB_OFFLINE=1
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    command: --model ModelSpace/GemmaX2-28-9B-v0.1 --task generate --served-model-name "GemmaX2" --gpu-memory-utilization 0.9 --cpu-offload-gb 56

Test the api using /docs (swagger) and /v1/chat/completions:

 {"model":"GemmaX2","messages":[{"role":"user","content":"Translate this from Arabic to English: Arabic: أنا أحب الترجمة الآلية English:"}],"max_tokens":512}

ModelSpace
/

GemmaX2-28-9B-v0.1

VLLM or SGLang?