JEE NUJAN MIX V2 - Base Merged Model

Model Description

This is the base merged model for JEE mathematics problem solving, created by combining three specialized models using linear interpolation. This model serves as the foundation for further fine-tuning on mathematical datasets.

Model Architecture

Merged Models:

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B (40% weight) - Advanced reasoning capabilities
Qwen/Qwen2.5-Math-1.5B (35% weight) - Mathematical problem solving
microsoft/phi-2 (25% weight) - General reasoning and language understanding

Merge Method: Linear interpolation with weight normalization Output Format: Float16 for efficiency Tokenizer: Based on DeepSeek-R1-Distill-Qwen-1.5B

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the merged base model
tokenizer = AutoTokenizer.from_pretrained("shivs28/jee_nujan_mix_v2_base")
model = AutoModelForCausalLM.from_pretrained(
    "shivs28/jee_nujan_mix_v2_base", 
    torch_dtype=torch.float16,
    device_map="auto"
)

# Use for mathematical reasoning
prompt = "Solve: What is the derivative of x^2 + 3x + 1?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Intended Use

This base model is intended to be:

Fine-tuned on specific mathematical datasets for enhanced performance
Used as starting point for educational AI applications
Evaluated for mathematical reasoning capabilities

Next Steps

This base model will be fine-tuned on comprehensive mathematical datasets including:

Competition Mathematics (MATH dataset)
GSM8K word problems
MathQA reasoning problems
AQuA-RAT algebraic problems
Custom JEE advanced problems

Model Card Authors

Created by the JEE NUJAN MIX team for educational purposes.

Citation

Please cite the original base models:

DeepSeek-R1-Distill-Qwen-1.5B
Qwen2.5-Math-1.5B
Phi-2 tags:
- open-llm-leaderboard
- math
- gsm8k
- casual-lm
- fine-tuned

This model is part of the NUJAN educational initiative.

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

F32

Model tree for shivs28/jee_nujan_mix_v2_base

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-Math-1.5B

Finetuned

(134)

this model