Model Overview

  • Model Architecture: Qwen3-VL-235B-A22B-Instruct
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0
  • Operating System(s): Linux
  • Inference Engine: vLLM
  • Model Optimizer: AMD-Quark
    • Weight quantization: Perchannel, FP8E4M3, Static
    • Activation quantization: Pertoken, FP8E4M3, Dynamic
  • Calibration Dataset: Pile

This model was built with Qwen3-VL-235B-A22B-Instruct model by applying AMD-Quark for ptpc quantization.

Model Quantization

The model was quantized from Qwen/Qwen3-VL-235B-A22B-Instruct using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/

python3 internal_scripts/quantize_quark.py --model_dir Qwen/Qwen3-VL-235B-A22B-Instruct \
                          --quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
                          --exclude_layers "lm_head" "*mlp.gate" "*.visual.*" \
                          --num_calib_data 512 \
                          --output_dir amd/Qwen3-VL-235B-A22B-Instruct-ptpc \
                          --model_export hf_format \

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

Evaluation

The evaluation results and reproduction script are being prepared.

License

Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for haoyang-amd/ts

Finetuned
(6)
this model