Text Generation
Safetensors
qwen3
text-generation-inference
text2text-generation
conversational
8-bit precision
compressed-tensors
Qwen3-32B-INT8 / README.md
zankich's picture
initial commit
bdaae7c verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen3-32B
datasets:
  - TokenBender/code_instructions_122k_alpaca_style
  - glaiveai/glaive-code-assistant-v2
  - google/code_x_glue_ct_code_to_text
pipeline_tag: text2text-generation
tags:
  - text-generation-inference

GPTQ INT8 W8A8 quantized Qwen/Qwen3-32B

GPTQ INT8 W8A8 quantized Qwen/Qwen3-32B calibrated with a sequence len of 4096 and 128 samples of TokenBender/code_instructions_122k_alpaca_style, glaiveai/glaive-code-assistant-v2, google/code_x_glue_ct_code_to_text for a total sample size of 1024.

Follow the Qwen/Qwen3-32B docs for running with vllm.