Quark Quantized Playable1

This is a fine-tuned and quark quantized version of Qwen/Qwen2.5-Coder-7B-Instruct using the 'iat-05-1' adapter.

Model Details

  • Base Model: Qwen/Qwen2.5-Coder-7B-Instruct
  • Adapter: iat-05-1
  • Quantization: Quark / UINT4 / AWQ / BFLOAT16
  • Format: SafeTensors
  • Perplexity Score: 10.953088
  • Dataset: wikitext-2-raw-v1

Quark Info

Quantizing with the quantization configuration:

Config(
    global_quant_config=QuantizationConfig(
        input_tensors=None,
        output_tensors=None,
        weight=QuantizationSpec(
            dtype=Dtype.uint4,
            observer_cls=<class 'quark.torch.quantization.observer.observer.PerGroupMinMaxObserver'>,
            is_dynamic=False,
            qscheme=QSchemeType.per_group,
            ch_axis=-1,
            group_size=128,
            symmetric=False,
            round_method=RoundType.half_even,
            scale_type=ScaleType.float,
            scale_format=None,
            scale_calculation_mode=None,
            qat_spec=None,
            mx_element_dtype=None,
            zero_point_type=ZeroPointType.int32,
            is_scale_quant=False,
        ),
        bias=None,
        target_device=None,
    ),
    layer_type_quant_config={},
    layer_quant_config={},
    kv_cache_quant_config={},
    kv_cache_group=['*k_proj', '*v_proj'],
    min_kv_scale=0.0,
    softmax_quant_spec=None,
    exclude=['[]'],
    algo_config=[
        AWQConfig(
            name="awq",
            scaling_layers=[{'prev_op': 'input_layernorm', 'layers': ['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj'], 'inp': 'self_attn.q_proj', 'module2inspect': 'self_attn'}, {'prev_op': 'self_attn.v_proj', 'layers': ['self_attn.o_proj'], 'inp': 'self_attn.o_proj'}, {'prev_op': 'post_attention_layernorm', 'layers': ['mlp.gate_proj', 'mlp.up_proj'], 'inp': 'mlp.gate_proj', 'module2inspect': 'mlp'}, {'prev_op': 'mlp.up_proj', 'layers': ['mlp.down_proj'], 'inp': 'mlp.down_proj'}],
            model_decoder_layers="model.layers",
        ),
    ],
    quant_mode=QuantizationMode.eager_mode,
    log_severity_level=1,
    version="0.10",
)
Downloads last month
24
Safetensors
Model size
1B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for playable/playable1-int4-bfloat16

Base model

Qwen/Qwen2.5-7B
Finetuned
playable/Playable1
Quantized
(5)
this model