Model Overview
- Model Architecture: Qwen3-VL-235B-A22B-Instruct
- Input: Text
- Output: Text
- Supported Hardware Microarchitecture: AMD MI350/MI355
- ROCm: 7.0
- Operating System(s): Linux
- Inference Engine: vLLM
- Model Optimizer: AMD-Quark
- Weight quantization: Perchannel, FP8E4M3, Static
- Activation quantization: Pertoken, FP8E4M3, Dynamic
- Calibration Dataset: Pile
This model was built with Qwen3-VL-235B-A22B-Instruct model by applying AMD-Quark for ptpc quantization.
Model Quantization
The model was quantized from Qwen/Qwen3-VL-235B-A22B-Instruct using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.
Quantization scripts:
cd Quark/examples/torch/language_modeling/llm_ptq/
python3 internal_scripts/quantize_quark.py --model_dir Qwen/Qwen3-VL-235B-A22B-Instruct \
--quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
--exclude_layers "lm_head" "*mlp.gate" "*.visual.*" \
--num_calib_data 512 \
--output_dir amd/Qwen3-VL-235B-A22B-Instruct-ptpc \
--model_export hf_format \
Deployment
Use with vLLM
This model can be deployed efficiently using the vLLM backend.
Evaluation
The evaluation results and reproduction script are being prepared.
License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for haoyang-amd/ts
Base model
Qwen/Qwen3-VL-235B-A22B-Instruct