🧠 eagle0504/qwen-distilled-scout-1.5b

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, enhanced with chain-of-thought (CoT) reasoning on a hybrid dataset combining GSM8K-style reasoning and structured text-to-SQL generation.

Fine-tuning was conducted using DeepSpeed on a multi-A100 GPU setup via RunPod for efficient training in memory-constrained environments. The training dataset includes complex logical SQL queries generated synthetically with corresponding natural language prompts and CoT explanations.

For inference, please see this publicly available notebook.


🧾 Model Details

  • Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Language: English
  • Architecture: Causal Language Model (Decoder-only)
  • Tokenizer: AutoTokenizer from base model
  • Parameter Count: 1.5 Billion
  • Training Framework: 🤗 Transformers + DeepSpeed
  • Compute Environment: RunPod (6x A100 SXM, 192 vCPU, 1.5TB RAM)

🧪 Training Dataset

Dataset Used:

The dataset contains structured training examples of the form:

<question>...</question>
<think>...</think>
<response>...</response>

Each example is constructed to include:

  • A natural language question about tabular data (sql_prompt)
  • An intermediate reasoning step (sql_explanation)
  • The final SQL output (sql)

This format allows the model to internalize step-by-step logical reasoning for SQL generation.


📊 Fine-Tuning Summary

The base model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:

Model ID Dataset Description GPUs vCPUs RAM (GB) Disk per GPU Container Image Duration Cost DeepSpeed Stage Precision Mean Token Accuracy
eagle0504/finetuned-deepseek-r1-distill-qwen-1.5b-by-openai-gsm8k-enhanced-v2 OpenAI GSM8K Enhanced v2 6 × H100 PCIe 144 1132 20 GB runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 2 hrs ~$28 Stage 1 FP16 98%
eagle0504/openai-gsm8k-codealpaca-20k-enhanced-deepseek-r1-distill-qwen-1.5b GSM8K + CodeAlpaca-20K Enhanced 4 × A100 SXM 146 1144 20 GB runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 2 hrs ~$14+ Stage 1 FP16 97%
eagle0504/qwen-distilled-scout-1.5b Custom CoT + SQL-Reasoning 6 × A100 SXM 192 1536 20 GB runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 1.5 hrs ~$21 Stage 2 FP16 97%

🧮 Evaluation Metric

The model is evaluated with a custom token-level accuracy metric:

  • Metric: Mean token-level accuracy
  • Definition: Accuracy over all non-masked tokens (labels != -100)
  • Implementation: NumPy-based vectorized comparison between predicted tokens and ground truth

🚀 Use Case

The model is designed for chain-of-thought text-to-SQL generation, useful in:

  • AI teaching agents
  • Conversational agents with data query capabilities
  • Auto SQL generation tools for tabular backends
  • Educational applications in logical reasoning

📦 How to Use

from transformers import StoppingCriteria, StoppingCriteriaList
import torch

class StopOnTokens(StoppingCriteria):
    def __init__(self, stop_token_ids: list):
        super().__init__()
        self.stop_token_ids = stop_token_ids

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        # Check if the last token matches any of the stop tokens
        return any(input_ids[0, -len(token):].tolist() == token for token in self.stop_token_ids)
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("eagle0504/qwen-distilled-scout-1.5b-gen2")
tokenizer = AutoTokenizer.from_pretrained("eagle0504/qwen-distilled-scout-1.5b-gen2")

# Example stop sequence
stop_sequence = "</response>"
stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])

# Run generation with stop sequence
inputs = tokenizer(
    "<question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?</question>",
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=230,
    stopping_criteria=stopping_criteria
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Limitations

  • The model is tuned for text-to-SQL tasks with CoT supervision and may not generalize well to free-form text generation or other domains without additional fine-tuning.
  • Maximum input length is 1024 tokens — longer contexts will be truncated.

🧑‍💻 Author

  • Name: Yiqiao Yin
  • Hugging Face: eagle0504
  • Organization: [WYN AI / Independent AI Researcher]

📝 Citation

If you use this model in your work, please cite:

@misc{yin2025enhanceddeepseek,
  title={Enhanced DeepSeek-R1-Distill-Qwen-1.5B Fine-tuned on GSM8K + CoT SQL},
  author={Yiqiao Yin},
  year={2025},
  howpublished={\url{https://huggingface.co/eagle0504/enhanced-deepseek-r1-distill-qwen-1.5b-finetuned-on-gsm8k-codealpaca20k-text2sql}},
}

📬 Contact

For questions or collaborations, reach out via LinkedIn or email: [email protected]

Downloads last month
15
Safetensors
Model size
1.78B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eagle0504/qwen-distilled-scout-1.5b

Dataset used to train eagle0504/qwen-distilled-scout-1.5b