File size: 2,213 Bytes

36aacc6
15ee3b9
 
 
 
 
 
36aacc6
 
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
 
 
15ee3b9
 
 
 
 
 
 
 
 
 
 
 
 
 
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
 
 
 
36aacc6
15ee3b9
36aacc6
 
 
15ee3b9
36aacc6
15ee3b9
 
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
 
 
 
 
 
 
 
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9
36aacc6
15ee3b9

---
tags:
- causal-lm
- qwen
- fine-tuned
- quantized
- mnlp
---

# Qwen3-0.6B Full-Precision + W8A8 Quantized MCQA Model

**Repository:** [Kikinoking/MNLP_M2_quantized_model](https://huggingface.co/Kikinoking/MNLP_M2_quantized_model)

This is a fine-tuned Qwen-3-0.6B causal-LM, trained on a concatenation of multiple MCQA datasets and then quantized to 8-bit weights and activations using the compressed-tensors format. It is designed for multiple-choice QA tasks, evaluated with the LightEval EPFL MNLP suite.

---

## Model Details

- **Base architecture:** Qwen-3 (0.6B parameters)
- **Pretrained checkpoint:** `Qwen/Qwen3-0.6B-Base`
- **Fine-tuning data sources:**
  - ScienceQA
  - QASC
  - OpenBookQA
  - MathQA
  - CommonsenseQA
  - MCQA prompts generated via ChatGPT (labeled `M1_chatgpt`)
- **Dataset split:** 95% train / 5% validation
- **Tokenization:**
  - `AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")`
  - Left padding, EOS token as pad_token
  - Sequence length capped at 2048 tokens

---

## Quantization

- **Method:** `compressed-tensors` / `naive-quantized`
- **Precision:** 8-bit weights + 8-bit activations
- **Layers kept in FP32:** Language modeling head
- **Checkpoint:** Compatible with CPU and GPU inference

---

## Evaluation

Tested using LightEval EPFL MNLP on the MCQA task:

```bash
lighteval accelerate   --eval-mode lighteval   --save-details   --override-batch-size 8   --custom-tasks community_tasks/mnlp_mcqa_evals.py   --output-dir out/lighteval_quant   model_configs/quantized_model.yaml   "community|mnlp_mcqa_evals|0|0"

Results:

    Accuracy: 0.30 ± 0.15

    Normalized Accuracy: 0.30 ± 0.15

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(
    "Kikinoking/MNLP_M2_quantized_model", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    "Kikinoking/MNLP_M2_quantized_model",
    trust_remote_code=True,
    device_map="auto",
)

License

    Being a 0.6B-parameter model, it may struggle with very long or ambiguous queries.

    Quantization can introduce a slight drop in accuracy (~5–10%).

    License: CC BY-NC 4.0 (inherits from the base Qwen-3 license)