Qwen3-Reranker-4B-seq-cls-vllm-fixed

This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM.

Model Description

This model is a pre-converted version of Qwen/Qwen3-Reranker-4B that:

  • Has been converted from CausalLM to SequenceClassification architecture
  • Includes proper configuration for vLLM compatibility
  • Provides ~75,000x reduction in classification head size
  • Offers ~150,000x fewer operations per token compared to using the full LM head

Key Improvements

The original converted model (tomaarsen/Qwen3-Reranker-4B-seq-cls) was missing critical vLLM configuration attributes. This version adds:

{
  "classifier_from_token": ["no", "yes"],
  "method": "from_2_way_softmax",
  "use_pad_token": false,
  "is_original_qwen3_reranker": false
}

These configurations are essential for vLLM to properly handle the pre-converted weights.

Usage with vLLM

vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \
    --task score \
    --served-model-name qwen3-reranker-4b \
    --disable-log-requests

Python Example

from vllm import LLM

llm = LLM(
    model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed",
    task="score"
)

queries = ["What is the capital of France?"]
documents = ["Paris is the capital of France."]

outputs = llm.score(queries, documents)
scores = [output.outputs.score for output in outputs]
print(scores)

Performance

This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements:

  • Memory: ~600MB โ†’ ~8KB for classification head
  • Compute: 151,936 logits โ†’ 1 logit per forward pass
  • Speed: Faster inference due to reduced computation

Technical Details

  • Architecture: Qwen3ForSequenceClassification
  • Base Model: Qwen/Qwen3-Reranker-4B
  • Conversion Method: from_2_way_softmax (yes_logit - no_logit)
  • Model Size: 4B parameters
  • Task: Reranking/Scoring

Citation

If you use this model, please cite the original Qwen3-Reranker:

@misc{qwen3reranker2024,
  title={Qwen3-Reranker},
  author={Qwen Team},
  year={2024},
  publisher={Hugging Face}
}

License

Apache 2.0 (inherited from the base model)

Downloads last month
-
Safetensors
Model size
4.02B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for danielchalef/fixed-qwen3-reranker-seq-cls

Base model

Qwen/Qwen3-4B-Base
Finetuned
(2)
this model