google/gemma-3-27b-it · No output / Repeated outputs when using Gemma 3 12B/27B on vLLM

Jul 1

I have hosted Gemma 3 27B and 12B on 4 L4 GPUs using vLLM and I am trying to translate in a few docs from English to Indic languages. However, I am not getting any output in the target language or getting repetitions in English. The vLLM serve command for these models is below. I tried using in sarvam-translate with the exact same settings and it just works out of the box.
I have tried messing in with generation parameters and even tried in with smaller sentences but it does not work. Am I missing something here?
This is my vLLM serve command:

vllm serve google/gemma-3-12b-it
--dtype bfloat16
--tensor-parallel-size 4
--port 8000
--max-model-len 8192
--enable-chunked-prefill
--gpu-memory-utilization 0.9

Vanilla client code that I have been trying:


# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id


tgt_lang = 'Hindi'
input_txt = 'Be the change you wish to see in the world.'
messages = [{"role": "system", "content": f"Translate the text below to {tgt_lang}."}, {"role": "user", "content": input_txt}]


response = client.chat.completions.create(model=model, messages=messages, temperature=0.01)
output_text = response.choices[0].message.content

print("Input:", input_txt)
print("Translation:", output_text)```

GokhanAI

Jul 1

I have this problem.

zerothweek

7 days ago

having the same issue. hope someone in google replies soon

johnzhengaz

5 days ago

Me too. For image-to-text, it is fine.