about the "model_max_length": 16384
#11
by
AlexWuKing
- opened
Same doubt
same issue. For vllm in the main page i found the following example:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager, where max_length is 32768