Model Overview

This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.

Model Architecture

  • Base Model: Qwen2.5-3B
  • Model Type: Causal Language Model
  • Architecture: Transformer with Rotary Position Embedding (RoPE), SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
  • Parameters: 3.09 billion
  • Layers: 36
  • Attention Heads: 16 for query, 2 for key and value (GQA)

Fine-Tuning Details

  • Technique: Low-Rank Adaptation (LoRA)
  • Framework: MLX
  • Dataset: isaiahbjork/chain-of-thought
  • Dataset Size: 7,143 examples
  • Iterations: 600

LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.

Downloads last month
6
GGUF
Model size
3.09B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ApatheticWithoutTheA/Qwen-2.5-3B-Reasoning

Base model

Qwen/Qwen2.5-3B
Quantized
(142)
this model

Dataset used to train ApatheticWithoutTheA/Qwen-2.5-3B-Reasoning