---
library_name: transformers
license: mit
datasets:
- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
language:
- en
new_version: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: question-answering
---

# Model Card for OpenAI GSM8K Dataset Enhanced with Reasoning

This model is fine-tuned to answer questions based on the OpenAI GSM8K dataset enhanced with reasoning provided from Deepseek R1.

Invoke notebook shared [here](https://colab.research.google.com/drive/1B_Fbz0w76QxHbo9zAOf_pyZKKNI0EJJ9?usp=sharing), a publicly available Colab notebook for tests.

---

## Model Details

### Model Description

This is a transformer-based question-answering model fine-tuned from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`. It was trained on a dataset derived from the OpenAI GSM8K benchmark, enhanced with chain-of-thought reasoning to encourage intermediate logical steps. The dataset pairs math word problems with structured answers, using `<think>...</think>` and `<answer>...</answer>` tags.

- **Developed by:** Yiqiao Yin
- **Model type:** Causal Language Model (fine-tuned for Q&A with reasoning)
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

---

## Training Configuration

- 🖥️ **Hardware:** Trained on a RunPod instance with:
  - 🔥 6 × NVIDIA H100 PCIe GPUs
  - 🧠 144 vCPUs
  - 🧮 1132 GB system RAM
  - 💽 20 GB disk per GPU
- 🐳 **Container Image:** `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04`
- ⏱️ **Total Training Time:** 2 hours
- 💸 **Cost:** ~$14/hour × 2 hours = **$28 USD**
- ⚙️ **Zero Redundancy Optimization:** DeepSpeed Stage 1
- 🎯 **Precision:** FP16 mixed-precision training

---

## Performance

- **Mean token-level accuracy:** **97%**
- Evaluation based on in-training token match accuracy over the formatted `<think>...</think><answer>...</answer>` structure.
- Model demonstrates strong reasoning capability in multi-step arithmetic and logic problems.

---

## Inference Format

To generate accurate completions, prompt the model in the following structure:

```
<question>Question: If Sally has 3 apples and buys 2 more, how many does she have in total?</question>
```

Be aware that this token `</question>` will prompt the answer to start with `<think>` which is trained into the model based on training data.

The model will continue reasoning within `<think>...</think>` and provide a final answer inside `<answer>...</answer>`.

---

## Intended Use

This model is intended for educational and research purposes in:
- Chain-of-thought prompting
- Math reasoning and logical inference
- Question-answering with intermediate steps

---

## Limitations

- Trained on structured synthetic data — real-world generalization may vary
- Best performance achieved when following the exact inference format
- Does not support multilingual inputs

---

## Citation

If you use this model, please cite:

```
@misc{yin2024gsm8k,
  author = {Yiqiao Yin},
  title = {TBD},
  year = 2025,
  note = {TBD}
}
```

## Model Card Contact

Author: Yiqiao Yin
Connect with me on [LinkedIn](https://www.linkedin.com/in/yiqiaoyin/)