GLM-4.6V-Flash Modified Tokenizer

This is a modified version of the tokenizer for GLM-4.6V-Flash. You can find the full details for the parent model here.

What's Changed

The chat template has been modified to preserve reasoning content (<think> blocks) from the previous assistant turn in addition to the current one. This creates a "rolling window" of visible thinking, allowing important thoughts to be carried forward across turns without flooding the context with tokens from all previous reasoning blocks.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericbotti/GLM-4.6V-Flash

Finetuned
(8)
this model