Model Overview

Description:

Nemotron-Research-Reasoning-Qwen-1.5B is the world’s leading 1.5B open-weight model for complex reasoning tasks such as mathematical problems, coding challenges, and scientific questions. It is trained using the Group Relative Policy Optimization (GRPO) algorithm on a diverse and comprehensive set of datasets. Our model has achieved impressive results, outperforming Deepseek’s model by a large margin on a broad range of tasks including math, coding, and GPQA.

This model is for research and development only.

License/Terms of Use

TBD

Deployment Geography:

Global

Use Case:

Researchers and developers can use this model to solve math, coding and STEM questions.

Release Date:

Huggingface x/xx/2025 via [URL]

References(s):

Haven't published yet but will release a paper at the same time as the model release.

Qwen2.5 Technical Report

Model Architecture:

Architecture Type: Dense decoder-only Transformer model

Network Architecture: DeepSeek-R1-Distill-Qwen-1.5B

**This model was developed based on DeepSeek-R1-Distill-Qwen-1.5B

Input:

Input Type(s): Text
Input Format: String
Input Parameters: 1D
Other Properties Related to Input: Context length up to 32,000 tokens

Output:

Output Type(s): Text
Output Format: String
Output Parameters: 1D
Other Properties Related to Output: Context length up to 32,000 tokens

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s): Transformers

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Hopper

Preferred/Supported Operating System(s):

Linux

Model Version(s):

1.0

Training, Testing, and Evaluation Datasets:

** The total size (in number of data points): 479K
** Total number of datasets: 5

** Dataset partition: Training [90%], testing [5%], validation [5%]
** Time period for training data collection [1984-01-01 to 2023-01-01]
** Time period for testing data collection [2024-01-01 to 2025-04-01]
** Time period for validation data collection [2024-01-01 to 2025-04-01]

Training Dataset:

Link:

Dataset	Link
DeepScaleR-Preview-Dataset	Link
Eurus-2-RL-Data	Link
Reasoning-gym	Link
IFEval	Link
SCP-116K	Link

Data Collection Method by dataset:

Hybrid: Automated, Human, Synthetic

Labeling Method by dataset:

Hybrid: Automated, Human, Synthetic

Properties (Quantity, Dataset Descriptions, Sensor(s)): 479K question and answer pairs

Testing Dataset:

Link:

Dataset	Link
DeepScaleR-Preview-Dataset	Link
Eurus-2-RL-Data	Link
Reasoning-gym	Link
IFEval	Link
SCP-116K	Link

Data Collection Method by dataset:

Hybrid: Automated, Human, Synthetic

Labeling Method by dataset:

Hybrid: Automated, Human, Synthetic

Properties (Quantity, Dataset Descriptions, Sensor(s)): 22K question and answer pairs

Evaluation Dataset:

Link: AIME: https://huggingface.co/datasets/opencompass/AIME2025

AMC: https://huggingface.co/datasets/AI-MO/aimo-validation-amc

**Benchmark Score

Dataset	Score
AIME	48.1
AMC	79.3

Data Collection Method by dataset:

Automated

Labeling Method by dataset:

Human

Properties (Quantity, Dataset Descriptions, Sensor(s)): 100 math question and answer pairs

Inference:

Acceleration Engine: Transformers
Test Hardware:

- 1x H100-80GB GPU

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.