Jinja template error?

by bjodah - opened 4 days ago

4 days ago

Using unloth's quant of Qwen3 (I've tried 0.6B, 1.7B and 4B), I see the following error when using latest llama.cpp (I've tried both master branch as of writing, as well a llama.cpp that was ~1 week old):

main: server is listening on http://127.0.0.1:8703 - starting the main loop
srv  update_slots: all slots are idle
srv  log_server_r: request: GET /health 127.0.0.1 200
got exception: {"code":500,"message":"Cannot subscript null at row 23, column 40:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n                                       ^\n    {%- set tool_end = '</tool_response>' %}\n at row 23, column 5:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n    ^\n    {%- set tool_end = '</tool_response>' %}\n at row 18, column 39:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n                                      ^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 18, column 1:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 1, column 1:\n{%- if tools %}\n^\n    {{- '<|im_start|>system\\n' }}\n","type":"server_error"}
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 500
got exception: {"code":500,"message":"Cannot subscript null at row 23, column 40:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n                                       ^\n    {%- set tool_end = '</tool_response>' %}\n at row 23, column 5:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n    ^\n    {%- set tool_end = '</tool_response>' %}\n at row 18, column 39:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n                                      ^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 18, column 1:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 1, column 1:\n{%- if tools %}\n^\n    {{- '<|im_start|>system\\n' }}\n","type":"server_error"}
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 500
got exception: {"code":500,"message":"Cannot subscript null at row 23, column 40:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n                                       ^\n    {%- set tool_end = '</tool_response>' %}\n at row 23, column 5:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n    ^\n    {%- set tool_end = '</tool_response>' %}\n at row 18, column 39:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n                                      ^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 18, column 1:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 1, column 1:\n{%- if tools %}\n^\n    {{- '<|im_start|>system\\n' }}\n","type":"server_error"}
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 500

If I switch to Bartowski's quants, the problem goes away.

This is how llama.cpp (OpenAI compatible server) was launched:

/opt/llama.cpp/build/bin/llama-server
  --log-file /logs/llamacpp-Qwen3-0.6B.log
  --port 8702
  #--hf-repo unsloth/Qwen3-0.6B-GGUF:Q8_K_XL
  --hf-repo bartowski/Qwen_Qwen3-0.6B-GGUF:Q8_0
  --n-gpu-layers 999
  --jinja
  --cache-type-k q8_0
  --ctx-size 32768
  --samplers "top_k;dry;min_p;temperature;top_p"
  --min-p 0.005
  --top-p 0.97
  --top-k 40
  --temp 0.7
  --dry-multiplier 0.7
  --dry-allowed-length 4
  --dry-penalty-last-n 2048
  --presence-penalty 0.05
  --frequency-penalty 0.005
  --repeat-penalty 1.01
  --repeat-last-n 16

bjodah

4 days ago

Hang on, I need to report a bit more to reproduce this. I'm using logprobs via litellm. I'll get back to you when I have more detailed instructions for reproducing my issue...

bjodah

4 days ago

•

edited 4 days ago

Ok, to reproduce this I need to send an empty string as content, using Unsloth's GGUF:

request: {"model": "llamacpp-Qwen3-8B", "messages": [{"role": "user", "content": ""}]}          
got exception: {"code":500,"message":"Cannot subscript null at row 23, column 40:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n                                       ^\n    {%- set tool_end
 = '</tool_response>' %}\n at row 23, column 5:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n    ^\n    {%- set tool_end = '</tool_response>' %}\n at row 18, column 39:\n{%- set ns = namesp
ace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n                                      ^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 18, column 1:\n{%- set ns = namespace(multi_step_tool
=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 1, column 1:\n{%- if tools %}\n^\n    {{- '<|im_start|>system\\n' }}\n","type":"server_error"}
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 500                            
srv  log_server_r: request:  {"model": "llamacpp-Qwen3-8B", "messages": [{"role": "user", "content": ""}]}                             
srv  log_server_r: response: {"error":{"code":500,"message":"Cannot subscript null at row 23, column 40:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n                                       
^\n    {%- set tool_end = '</tool_response>' %}\n at row 23, column 5:\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = message.content[:tool_start_length] %}\n    ^\n    {%- set tool_end = '</tool_response>' %}\n at row 18, column 3
9:\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n                                      ^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 18, column 1:\n{%- set ns = na
mespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n^\n    {%- set index = (messages|length - 1) - loop.index0 %}\n at row 1, column 1:\n{%- if tools %}\n^\n    {{- '<|im_start|>system\\n' }}\n","type":"server_
error"}}

using Bartowski's GGUF:

request: {"model": "llamacpp-Qwen3-4B", "messages": [{"role": "user", "content": ""}]}                                                 
srv  params_from_: Grammar:                                                                                                            
srv  params_from_: Grammar lazy: false                                                                                                 
srv  params_from_: Chat format: Content-only                                                                                           
srv  add_waiting_: add task 0 to waiting list. current waiting = 0 (before add)                                                                                                                                                                                               
que          post: new task, id = 0/1, front = 0                                                                                       
que    start_loop: processing new tasks                                                                                                
que    start_loop: processing task, id = 0                                                                                             
slot get_availabl: id  0 | task -1 | selected slot by lru, t_last = -1                                                                                                                                                                                                        
slot        reset: id  0 | task -1 |                                                                                                                                                                                                                                          
slot launch_slot_: id  0 | task 0 | launching slot : {"id":0,"id_task":0,"n_ctx":32768,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.699999988079071,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"t
op_k":40,"top_p":0.9700000286102295,"min_p":0.004999999888241291,"top_n_sigma":-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":16,"repeat_penalty":1.0099999904632568,"presence_penalty":0.05000000074505806,"frequency_penalty
":0.004999999888241291,"dry_multiplier":0.699999988079071,"dry_base":1.75,"dry_allowed_length":4,"dry_penalty_last_n":2048,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0
,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[],"chat_format":"Content-only","samplers":["top_k","dry","min_p","temperature","top_p"],"speculative.n_m
ax":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>user\nNone<|im_end|>\n<|im_start|>assistant\n","next_token":{"has_next_token":true,"has_new_line":false,"n_remain":-1,"n_decoded"
:0,"stopping_word":""}}
slot launch_slot_: id  0 | task 0 | processing task
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 1, front = 0
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 9
slot update_slots: id  0 | task 0 | prompt token   0: 151644 '<|im_start|>'
slot update_slots: id  0 | task 0 | prompt token   1:    872 'user' 
slot update_slots: id  0 | task 0 | prompt token   2:    198 '


/.../  skipping logging for a bunch of tokens.... /.../


srv  to_json_oaic: Parsing chat message: <think>
Okay, the user just said "None." That's pretty vague. I need to figure out what they're asking for. Maybe they're looking for help with something but didn't specify. Let me check the history. There's no previous conversation, so I'm starting fresh.

Hmm, "None" could mean they don't have a specific question or they're testing the system. I should respond in a friendly way to encourage them to ask anything they need. Maybe offer assistance with general topics or ask how I can help. Let me make sure my response is open-ended and welcoming. I should avoid assuming they need something particular. Let's go with a simple, polite message that invites them to ask questions.
</think>

Hello! It seems like you might be testing the waters or have no specific question in mind. Feel free to ask me anything you need help with—whether it's a question, a problem to solve, or just curious about something. I'm here to assist! 😊
srv  remove_waiti: remove task 0 from waiting list. current waiting = 1 (before remove)
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv  log_server_r: request:  {"model": "llamacpp-Qwen3-4B", "messages": [{"role": "user", "content": ""}]}
srv  log_server_r: response: {"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"<think>\nOkay, the user just said \"None.\" That's pretty vague. I need to figure out what they're asking for. Maybe they're looking for help with something but didn't specify. Let me check the history. There's no previous conversation, so I'm starting fresh.\n\nHmm, \"None\" could mean they don't have a specific question or they're testing the system. I should respond in a friendly way to encourage them to ask anything they need. Maybe offer assistance with general topics or ask how I can help. Let me make sure my response is open-ended and welcoming. I should avoid assuming they need something particular. Let's go with a simple, polite message that invites them to ask questions.\n</think>\n\nHello! It seems like you might be testing the waters or have no specific question in mind. Feel free to ask me anything you need help with—whether it's a question, a problem to solve, or just curious about something. I'm here to assist! 😊"}}],"created":1746711403,"model":"llamacpp-Qwen3-4B","system_fingerprint":"b1-8733e0c","object":"chat.completion","usage":{"completion_tokens":203,"prompt_tokens":9,"total_tokens":212},"id":"chatcmpl-caOxzltaPITBebtJFNKOodmCkmhLygv3","__verbose":{"index":0,"content":"<think>\nOkay, the user just said \"None.\" That's pretty vague. I need to figure out what they're asking for. Maybe they're looking for help with something but didn't specify. Let me check the history. There's no previous conversation, so I'm starting fresh.\n\nHmm, \"None\" could mean they don't have a specific question or they're testing the system. I should respond in a friendly way to encourage them to ask anything they need. Maybe offer assistance with general topics or ask how I can help. Let me make sure my response is open-ended and welcoming. I should avoid assuming they need something particular. Let's go with a simple, polite message that invites them to ask questions.\n</think>\n\nHello! It seems like you might be testing the waters or have no specific question in mind. Feel free to ask me anything you need help with—whether it's a question, a problem to solve, or just curious about something. I'm here to assist! 😊","tokens":[],"id_slot":0,"stop":true,"model":"llamacpp-Qwen3-4B","tokens_predicted":203,"tokens_evaluated":9,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.699999988079071,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.9700000286102295,"min_p":0.004999999888241291,"top_n_sigma":-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":16,"repeat_penalty":1.0099999904632568,"presence_penalty":0.05000000074505806,"frequency_penalty":0.004999999888241291,"dry_multiplier":0.699999988079071,"dry_base":1.75,"dry_allowed_length":4,"dry_penalty_last_n":2048,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[],"chat_format":"Content-only","samplers":["top_k","dry","min_p","temperature","top_p"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>user\nNone<|im_end|>\n<|im_start|>assistant\n","has_new_line":true,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":211,"timings":{"prompt_n":9,"prompt_ms":20.59,"prompt_per_token_ms":2.287777777777778,"prompt_per_second":437.1053909664886,"predicted_n":203,"predicted_ms":1766.44,"predicted_per_token_ms":8.70167487684729,"predicted_per_second":114.92040488213581}},"timings":{"prompt_n":9,"prompt_ms":20.59,"prompt_per_token_ms":2.287777777777778,"prompt_per_second":437.1053909664886,"predicted_n":203,"predicted_ms":1766.44,"predicted_per_token_ms":8.70167487684729,"predicted_per_second":114.92040488213581}}

Sending empty string as content is obviously a mistake. But it took a while for me to figure it out.