bias error and missing tokenizer.json

by Testator - opened Jun 23

Jun 23

•

Heads-up for anyone loading the Pixtral-Large-Instruct-2411 EXL2 quant (4 bpw)

The original Mistral weights ship without bias tensors in the three multi_modal_projector.linear_* layers, so config.json correctly sets:

"multimodal_projector_bias": false

During the EXL2 conversion, the exporter automatically created zero-initialised bias tensors for those layers. That means the quantised checkpoint does contain bias parameters, while the flag still says “false”.
Back-ends that cross-check config vs. weights (e.g. ExLlama ≥ 0.3, TabbyAPI) will raise:

AssertionError: multi_modal_projector.linear_2 has bias tensor but bias is not expected

Work-arounds

Quick fix – edit config.json and set
"multimodal_projector_bias": true
(functionally harmless, because the extra tensors are 0-filled).
Purist fix – strip the three bias tensors from all safetensor shards and keep the flag false.

Either approach resolves the crash.

Furthermore I have used the following script to generate the missing tokenizer.json

#!/usr/bin/env python3
"""
fix_tokenizer.py
================
Creates the *missing* tokenizer artifacts that ExLlama‑v2 / TabbyAPI expect
for Mistral / Pixtral / Mixtral EXL2, AWQ or GPTQ weight folders.

Given any SentencePiece **tokenizer.model** (or `tokenizer.model.v7m1`) the
script will:
    • generate **tokenizer.json**  (HF fast‑tokenizer format)
    • create / patch **tokenizer_config.json** with a working chat template

Usage
-----
    python fix_tokenizer.py --model /path/to/tokenizer.model

Dependencies
------------
   pip install protobuf sentencepiece tokenizers "transformers>=4.42"

Works with ExLlama v2 ≥ 0.3.1, TabbyAPI, vLLM etc.
"""
import argparse, json, os, sys, textwrap

CHAT_TEMPLATE = textwrap.dedent(
    """{{bos_token}}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ ' [INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' ' + eos_token }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"""
)


def build_tokenizer_json(spm_path: str, out_dir: str):
    """Generate tokenizer.json & special_tokens_map.json via transformers."""
    try:
        from transformers import LlamaTokenizerFast
    except ImportError as e:
        print("[ERR] transformers>=4.42 required – pip install transformers", file=sys.stderr)
        raise e

    # vocab_file accepts SentencePiece binaries
    tok = LlamaTokenizerFast(vocab_file=spm_path, legacy=False)
    tok.save_pretrained(out_dir, legacy_format=False)  # writes tokenizer.json


def patch_tokenizer_config(cfg_path: str):
    with open(cfg_path, "r", encoding="utf-8") as f:
        try:
            cfg = json.load(f)
        except json.JSONDecodeError:
            cfg = {}

    cfg.update({
        "add_bos_token": False,
        "add_eos_token": False,
        "bos_token": "<s>",
        "eos_token": "</s>",
        "unk_token": "<unk>",
        "tokenizer_class": "LlamaTokenizer",
        "model_max_length": 2**63 - 1,
        "spaces_between_special_tokens": False,
        "chat_template": CHAT_TEMPLATE,
    })

    with open(cfg_path, "w", encoding="utf-8") as f:
        json.dump(cfg, f, indent=2)


def main():
    ap = argparse.ArgumentParser(description="Create HF tokenizer.json + config for ExLlama")
    ap.add_argument("--model", required=True, help="Path to tokenizer.model(.v7m1)")
    args = ap.parse_args()

    spm_path = os.path.abspath(args.model)
    if not os.path.isfile(spm_path):
        ap.error(f"{spm_path} is not a file")

    model_dir = os.path.dirname(spm_path)

    # 1. generate tokenizer.json
    build_tokenizer_json(spm_path, model_dir)

    # 2. ensure tokenizer_config.json exists and has template
    cfg_path = os.path.join(model_dir, "tokenizer_config.json")
    if not os.path.exists(cfg_path):
        with open(cfg_path, "w", encoding="utf-8") as f:
            json.dump({}, f)
    patch_tokenizer_config(cfg_path)

    print(f"✓ tokenizer.json & tokenizer_config.json written to {model_dir}")


if __name__ == "__main__":
    main()

Hope that saves someone a headache!

Lockout

Jun 29

•

edited Jun 29

I don't get any error loading it and I just copied a similar tokenizer: https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411/discussions/1

Issue I have is that the model starts to get wonky at like 8k ctx of multi-turn. It begins to break away from formatting and starts to run with strings of related words. Coherence definitely way down at that point compared to the begining. Images take up a lot of CTX so I run it at 64k but I'm nowhere near that point. Using Q6 cache, hopefully that's not the problem. I heard vaguely of similar problems before and quanters using longer datasets to get around it.

Perhaps people will eventually make an exl3 (and it gets TP, heh) so I can compare and contrast. This is one of the only large models that can chat with memes. Besides it you have gemma.. aaaand claude/gemini/etc. Clouds or pipsqueaks.

Testator

Jul 1

The model is definitely flagged wrongly in terms of BIAS even it it doesn't lead to a crash anymore. I also don't think that you can just take any other that looks similar. I had ChatGPT explaining why there wasn't one included in first place and it wrote also the script above to create the missing stuff. So maybe try that.

Lockout

Jul 5

•

edited Jul 22

They are all just quantizations of the same model so I'd hope they match the original. Upload the one the script made and we can diff them. One I have is 3.7mb

I found the model does way better on long context if you don't feed it any images. Pure chat stayed coherent to 32k. I will see if anything is different from changing the config flag.

on 5.0bpw : assert not self.has_bias, self.key + " has no bias tensor but bias is expected"

I converted with the script and the tokenizer is the same. It makes broken chat template in the other file.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment