This is a quantization of the QwQ-32B.

The QwQ-32B model stands out as a medium-sized reasoning powerhouse within the Qwen series, notably excelling in tasks that require advanced thinking and problem-solving capabilities. This model, with 32.5 billion parameters, is particularly adept at handling complex reasoning tasks and outperforms traditional instruction-tuned models by a significant margin. Its architecture, enriched with transformers incorporating RoPE, SwiGLU, and RMSNorm technologies, allows it to adeptly manage extensive sequences, reaching up to 131,072 tokens. Designed for enhanced reasoning abilities, the QwQ-32B model is optimized for use in challenging downstream tasks, such as complex mathematical problems and standardized multiple-choice questions, making it a valuable asset in environments where sophisticated cognitive processing is required.

Evaluations

This model provides an accuracy recovery of 100.0%.

English QwQ-32B QwQ-32B-FP8-Dynamic (this)
Avg. 74.05 74.05
ARC 72.7 72.8
Hellaswag 75.4 75.3

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/QwQ-32B-FP8-Dynamic --max-model-len 131072 --gpu-memory-utilization 0.95

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/QwQ-32B-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '
Downloads last month
115
Safetensors
Model size
32.8B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for cortecs/QwQ-32B-FP8-Dynamic

Base model

Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Quantized
(93)
this model