--- language: - en license: apache-2.0 library_name: transformers tags: - merge - model-merging - mergekit - lazymergekit - qwen3 - 4b - text-generation - causal-lm datasets: - Idavidrein/gpqa metrics: - accuracy base_model: - Qwen/Qwen3-4B-Instruct-2507 - Qwen/Qwen3-4B-Instruct-2507-FP8 - unsloth/Qwen3-4B-Instruct-2507 - huihui-ai/Huihui-Qwen3-4B-Instruct-2507-abliterated - g-assismoraes/Qwen3-4B-Instruct-2507-imdb - g-assismoraes/Qwen3-4B-Instruct-2507-assin2 - g-assismoraes/Qwen3-4B-Instruct-2507-faquad - g-assismoraes/Qwen3-4B-Instruct-2507-hatebr - g-assismoraes/Qwen3-4B-Instruct-2507-agnews - BRlkl/BingoGuard-qwen3-4B-pt base_model_relation: merge model-index: - name: qwen3-4b-merged---configuration-1 results: - task: type: text-generation name: Text Generation dataset: type: cais/mmlu name: MMLU (Massive Multitask Language Understanding) config: all split: test args: num_few_shot: 5 metrics: - type: accuracy value: 72.51 name: MMLU (5-shot) verified: false - task: type: text-generation name: Text Generation dataset: type: Idavidrein/gpqa name: GPQA (Graduate-level Physics Q&A) config: gpqa_diamond split: test args: num_few_shot: 0 metrics: - type: accuracy value: 45.45 name: GPQA Diamond (0-shot) verified: false --- # Qwen3-4B Merged - Configuration 0 This is a Qwen3-4B based model created through layer-wise merging of multiple fine-tuned variants to optimize performance on GPQA Diamond. ## Performance Metrics | Benchmark | Score | Description | |-----------|-------|-------------| | **MMLU (5-shot)** | 0.7251 (72.51%) | Massive Multitask Language Understanding | | **GPQA Diamond (0-shot)** | 0.4545 (45.45%) | Graduate-level Physics Q&A | ### Benchmark Details - **MMLU**: Evaluated on the test set with 5-shot prompting across 57 subjects - **GPQA**: Evaluated on the diamond subset with 0-shot prompting on graduate-level physics questions ## Performance Visualizations ### GPQA Diamond Performance Comparison ![GPQA Performance](./gpqa_performance.png) ### MMLU and GPQA Diamond Combined Performance ![MMLU and GPQA Performance](./mmlu_gpqa_performance.png) ## Model Information - **Run ID**: 20250808_233922 - **Optimization Task**: GPQA (Graduate-level Physics Q&A) - **Number of Layers**: 36 - **Base Architecture**: Qwen3-4B ## Source Models The following models were used in the layer-wise merge: - Qwen/Qwen3-4B-Instruct-2507 - Qwen/Qwen3-4B-Instruct-2507-FP8 - unsloth/Qwen3-4B-Instruct-2507 - huihui-ai/Huihui-Qwen3-4B-Instruct-2507-abliterated - g-assismoraes/Qwen3-4B-Instruct-2507-imdb - g-assismoraes/Qwen3-4B-Instruct-2507-assin2 - g-assismoraes/Qwen3-4B-Instruct-2507-faquad - g-assismoraes/Qwen3-4B-Instruct-2507-hatebr - g-assismoraes/Qwen3-4B-Instruct-2507-agnews - BRlkl/BingoGuard-qwen3-4B-pt ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load the model model = AutoModelForCausalLM.from_pretrained( "ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0") # Example: MMLU-style question prompt = '''Question: The study of the distribution and determinants of health and disease in populations is: A) Epidemiology B) Ecology C) Etiology D) Endocrinology Answer:''' inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_length=150, temperature=0.7, do_sample=True ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Inference with vLLM ```python from vllm import LLM, SamplingParams llm = LLM(model="ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-1") sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=256) prompts = ["Question: Explain quantum entanglement in simple terms."] outputs = llm.generate(prompts, sampling_params) ``` ## Technical Details This model uses a layer-wise merging approach where each transformer layer is selected from different source models based on optimization criteria. This technique allows combining strengths from multiple fine-tuned models. ### Merging Process 1. **Layer Selection**: Each layer (0-35 for this architecture) is independently selected from one of the source models 2. **Non-layer Weights**: Embeddings and final layers are taken from the base model 3. **Optimization**: The configuration was found through systematic optimization on the target benchmark ## Limitations - This is an experimental merge and performance may vary on tasks outside the optimization targets - The model inherits limitations from its source models - Performance on general tasks may differ from benchmark scores ## Citation If you use this model, please cite the original source models and parrotrouter.com ## Note This model is provided for research purposes. Always validate performance on your specific use case before deployment.