Model Overview

Model Architecture: Qwen3-VL-235B-A22B-Instruct
- Input: Text
- Output: Text
Supported Hardware Microarchitecture: AMD MI350/MI355
ROCm: 7.0
Operating System(s): Linux
Inference Engine: vLLM
Model Optimizer: AMD-Quark
- Weight quantization: Perchannel, FP8E4M3, Static
- Activation quantization: Pertoken, FP8E4M3, Dynamic
Calibration Dataset: Pile

This model was built with Qwen3-VL-235B-A22B-Instruct model by applying AMD-Quark for ptpc quantization.

Model Quantization

The model was quantized from Qwen/Qwen3-VL-235B-A22B-Instruct using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/

python3 internal_scripts/quantize_quark.py --model_dir Qwen/Qwen3-VL-235B-A22B-Instruct \
                          --quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
                          --exclude_layers "lm_head" "*mlp.gate" "*.visual.*" \
                          --num_calib_data 512 \
                          --output_dir amd/Qwen3-VL-235B-A22B-Instruct-ptpc \
                          --model_export hf_format \

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

Evaluation

The evaluation results and reproduction script are being prepared.

License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for haoyang-amd/ts

Base model

Qwen/Qwen3-VL-235B-A22B-Instruct

Finetuned

(6)

this model