Quantized Qwen 2.5 Coder 0.5B
Collection
Qwen 2.5 Coder 0.5B Model is approx 990 Mb in size. This model collections are quantize versions of the model, created through selective quantization.
โข
2 items
โข
Updated
This model is quantized using selective quantization from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming. The quantization method included 32-bit quantization of the following Layers:
Rest of the remaining layers were quantized to q3_k_l
Layer Name | Role (Short) | Type |
---|---|---|
q_proj , k_proj , v_proj |
Compute query, key, and value for attention mechanism | Attention Proj |
o_proj |
Projects attention output back to model hidden size | Attention Proj |
down_proj |
Projects MLP output down to hidden size | MLP |
gate_proj |
First part of Gated MLP, controls info flow | MLP |
up_proj |
Expands hidden size in MLP | MLP |
lm_head |
Final linear layer for logits | Output Head |
embed_tokens |
Token embedding layer | Input Embed |
norm |
Final layernorm | Normalization |
*_layernorm |
Normalize inputs to layers | Normalization |
Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151936, 896, padding_idx=151665)
(layers): ModuleList(
(0-23): 24 x Qwen2DecoderLayer(
(self_attn): Qwen2Attention(
(q_proj): Linear(in_features=896, out_features=896, bias=True)
(k_proj): Linear(in_features=896, out_features=128, bias=True)
(v_proj): Linear(in_features=896, out_features=128, bias=True)
(o_proj): Linear(in_features=896, out_features=896, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=896, out_features=4864, bias=False)
(up_proj): Linear(in_features=896, out_features=4864, bias=False)
(down_proj): Linear(in_features=4864, out_features=896, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((896,), eps=1e-06)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
)
32-bit