Model Overview
Description:
Nemotron-Research-Reasoning-Qwen-1.5B is the world’s leading 1.5B open-weight model for complex reasoning tasks such as mathematical problems, coding challenges, and scientific questions. It is trained using the Group Relative Policy Optimization (GRPO) algorithm on a diverse and comprehensive set of datasets. Our model has achieved impressive results, outperforming Deepseek’s model by a large margin on a broad range of tasks including math, coding, and GPQA.
This model is for research and development only.
License/Terms of Use
TBD
Deployment Geography:
Global
Use Case:
Researchers and developers can use this model to solve math, coding and STEM questions.
Release Date:
Huggingface x/xx/2025 via [URL]
References(s):
Haven't published yet but will release a paper at the same time as the model release.
Model Architecture:
Architecture Type: Dense decoder-only Transformer model
Network Architecture: DeepSeek-R1-Distill-Qwen-1.5B
**This model was developed based on DeepSeek-R1-Distill-Qwen-1.5B
Input:
Input Type(s): Text
Input Format: String
Input Parameters: 1D
Other Properties Related to Input: Context length up to 32,000 tokens
Output:
Output Type(s): Text
Output Format: String
Output Parameters: 1D
Other Properties Related to Output: Context length up to 32,000 tokens
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s): Transformers
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere
- NVIDIA Hopper
Preferred/Supported Operating System(s):
- Linux
Model Version(s):
1.0
Training, Testing, and Evaluation Datasets:
** The total size (in number of data points): 479K
** Total number of datasets: 5
** Dataset partition: Training [90%], testing [5%], validation [5%]
** Time period for training data collection [1984-01-01 to 2023-01-01]
** Time period for testing data collection [2024-01-01 to 2025-04-01]
** Time period for validation data collection [2024-01-01 to 2025-04-01]
Training Dataset:
Link:
Dataset | Link |
---|---|
DeepScaleR-Preview-Dataset | Link |
Eurus-2-RL-Data | Link |
Reasoning-gym | Link |
IFEval | Link |
SCP-116K | Link |
Data Collection Method by dataset:
- Hybrid: Automated, Human, Synthetic
Labeling Method by dataset:
- Hybrid: Automated, Human, Synthetic
Properties (Quantity, Dataset Descriptions, Sensor(s)): 479K question and answer pairs
Testing Dataset:
Link:
Dataset | Link |
---|---|
DeepScaleR-Preview-Dataset | Link |
Eurus-2-RL-Data | Link |
Reasoning-gym | Link |
IFEval | Link |
SCP-116K | Link |
Data Collection Method by dataset:
- Hybrid: Automated, Human, Synthetic
Labeling Method by dataset:
- Hybrid: Automated, Human, Synthetic
Properties (Quantity, Dataset Descriptions, Sensor(s)): 22K question and answer pairs
Evaluation Dataset:
Link: AIME: https://huggingface.co/datasets/opencompass/AIME2025
AMC: https://huggingface.co/datasets/AI-MO/aimo-validation-amc
**Benchmark Score
Dataset | Score |
---|---|
AIME | 48.1 |
AMC | 79.3 |
Data Collection Method by dataset:
- Automated
Labeling Method by dataset:
- Human
Properties (Quantity, Dataset Descriptions, Sensor(s)): 100 math question and answer pairs
Inference:
Acceleration Engine: Transformers
Test Hardware:
- 1x H100-80GB GPU
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.