qwen3-8b-aijoah-magic8 made by "AIJOAH"

Subscribing to my YouTube channel AIJOAH

By combining Qwen3-8B-Base (strong general language understanding) with DeepSeek-R1-0528-Qwen3-8B (powerful reasoning and code/math ability), this merge captures the best of both worlds.

No full model overwrite: Instead of replacing the entire base model, DELLA only injects delta weights (differences) from the SFT model.

Lighter than LoRA: LoRA adds extra parameters during inference. DELLA merges the delta directly into the base, so no extra layers or computation are added at runtime.

Faster than SFT: No supervised fine-tuning (SFT) is required. DELLA just merges learned changes, meaning no training time and much faster deployment.

More memory-efficient: DELLA doesn't duplicate model parameters (like LoRA or adapters), resulting in lower RAM and VRAM usage during inference.

Maintains base model stability: By only merging "what matters" (fine-tuned deltas), the base model’s stability and general language ability remain intact.

Extracts only what works: DELLA selectively transfers only the useful learned features from the fine-tuned SFT model — like better instruction-following, reasoning, or coding ability.

Merge Method

This model was merged using the DELLA merge method

Models Merged

The following models were included in the merge:

Test

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "./qwen3-8b-aijoah-magic8"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.20it/s]
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
thinking content: <think>
Okay, the user asked for a short introduction to large language models. Let me start by understanding their request. They want something brief, so I need to keep it concise but informative. 

First, I should define what LLMs are. They're AI systems trained on massive text data. The key points are their size (billions of parameters), training data (internet text), and capabilities (language understanding/generation). 

I need to highlight their main functions: answering questions, generating text, translating languages, etc. Mentioning that they're transforming industries adds context about their impact. 

Wait, the user might be a student or someone new to AI. They probably want a clear, jargon-free explanation. Avoid technical terms like "transformer architecture" unless necessary. 

Also, check if there's an unspoken need. Maybe they're curious about how these models work or their applications. But since the query is for a short intro, stick to the basics. 

Make sure the response is engaging but not overwhelming. Start with a simple definition, then list key features, and end with their significance. Keep it structured but natural. 

Double-check for clarity. Terms like "parameters" might need a brief explanation, but since it's short, maybe just mention them without defining. 

Alright, draft it out: Start with "What are LLMs?", explain their training, size, functions, and impact. Keep sentences short. That should cover the user's needs and any underlying curiosity.
</think>
content: Okay, here's a short introduction to Large Language Models (LLMs):

Large Language Models (LLMs) are sophisticated AI systems trained on massive amounts of text data from the internet. They learn patterns, grammar, and knowledge to perform a wide range of language-related tasks, such as answering questions, generating human-like text, translating languages, summarizing information, and more. Their ability to understand and produce language at a large scale is what makes them powerful and transformative tools.

Citation

If you find our work helpful, feel free to give us a cite.

AIJOAH

@misc{aijoah2025mergeddeepseekqwen3,
  title        = {Merged DeepSeek R1 and Qwen3-8B-Base using DELLA},
  author       = {aijoah},
  note         = {YouTube Channel: \url{https://www.youtube.com/@JayLee-gv8tv}},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/aijoah/merged-deepseek-qwen3-8b}}
}

QWEN3

@misc{qwen3technicalreport,
  title        = {Qwen3 Technical Report},
  author       = {Qwen Team},
  year         = {2025},
  eprint       = {2505.09388},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2505.09388}
}

DeepSeek-R1

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
  title        = {DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
  author       = {DeepSeek-AI},
  year         = {2025},
  eprint       = {2501.12948},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2501.12948}
}

Contact

If you have any questions, please raise an issue or contact us at ([email protected]).