Introducing Cosmos Predict-2: A Foundation For Your Own World Model
Building smarter robots and autonomous vehicles (AVs) starts with foundation models that understand real-world dynamics. These models serve two critical roles:
- Accelerating synthetic data generation (SDG) to teach machines about real-world physics and interactionsβincluding rare edge cases
- Serving as base models that can be post-trained for specialized tasks or adapted to different output types
Cosmos Predict-1 was built for this purpose, generating realistic, physics-aware future world states.
Now, Cosmos Predict-2 introduces major upgrades in speed, visual quality, and customization.
π Introducing Cosmos Predict-2
Cosmos Predict-2 is our top-performing world foundation model for physical AI with architectural refinements that improve speed, scalability, and allow resolution and framerate flexibility across use cases and hardware platforms.
Two model variants are available, optimized for task complexity:
Cosmos Predict-2 2B
Fast inference and low memory usage. Ideal for prototyping, low-latency applications, and edge deployments.Cosmos Predict-2 14B
Designed for high-fidelity world modeling, complex scene understanding, extended temporal coherence, and prompt precision.
Developers can start with the video2world model by using a reference image of a robot or AV environment to generate consistent, physically accurate world states as video. The text-to-image model can also create a preview image from a text prompt.
π Resolution & Framerate Options
Cosmos Predict-2 offers flexible output formats:
Resolution
- 720p
- 480p for faster throughput
Framerate
- Available: 10 fps, 16 fps
- Coming soon: 24 fps (ideal for 10 Hz simulation and AV training pipelines)
βοΈ Inference & Performance
Cosmos Predict-2 is designed for fast, flexible inference across hardware setups.
2B Variant :For fast performance on NVIDIA GB200 NVL72, DGX B200, RTX PRO 6000 or RTX 6000 Ada.
14B Variant : For higher fidelity and for complex, temporally coherent tasks on GB200/B200 systems.
import torch
from imaginaire.utils.io import save_image_or_video
from cosmos_predict2.configs.base.config_video2world import PREDICT2_VIDEO2WORLD_PIPELINE_2B
from cosmos_predict2.pipelines.video2world import Video2WorldPipeline
# Create the video generation pipeline.
pipe = Video2WorldPipeline.from_config(
config=PREDICT2_VIDEO2WORLD_PIPELINE_2B,
dit_path="checkpoints/nvidia/Cosmos-Predict2-2B-Video2World/model-720p-16fps.pt",
text_encoder_path="checkpoints/google-t5/t5-11b",
)
# Specify the input image path and text prompt.
image_path = "assets/video2world/example_input.jpg"
prompt = "A high-definition video captures the precision of robotic welding in an industrial setting. The first frame showcases a robotic arm, equipped with a welding torch, positioned over a large metal structure. The welding process is in full swing, with bright sparks and intense light illuminating the scene, creating a vivid display of blue and white hues. A significant amount of smoke billows around the welding area, partially obscuring the view but emphasizing the heat and activity. The background reveals parts of the workshop environment, including a ventilation system and various pieces of machinery, indicating a busy and functional industrial workspace. As the video progresses, the robotic arm maintains its steady position, continuing the welding process and moving to its left. The welding torch consistently emits sparks and light, and the smoke continues to rise, diffusing slightly as it moves upward. The metal surface beneath the torch shows ongoing signs of heating and melting. The scene retains its industrial ambiance, with the welding sparks and smoke dominating the visual field, underscoring the ongoing nature of the welding operation."
# Run the video generation pipeline.
video = pipe(input_path=image_path, prompt=prompt)
# Save the resulting output video.
save_image_or_video(video, "output/test.mp4", fps=16)
π Full setup instructions: nvidia-cosmos/cosmos-predict2 GitHub repo
π οΈ Post-train Cosmos Predict-2 for your use case
Cosmos Predict-2 can be post-trained for custom applications such as robotics, AVs, and industrial automation. Post-training usecases can be:
Domain | Hardware-Specific Manipulation | Example Application |
---|---|---|
Robotics | Instruction control, object manipulation | Pick apples with varying stem strength |
AVs | Multiview gen, edge-case sim. | Rainy highway driving with lidar/camera sync |
Industrial | Action-conditioned workflows | Predictive maintenance for conveyor belt robots |
Vision | Camera pose conditioning | 3D-consistent video from single images |
Let's say you need to perform post-training for the first example from the table above.
Step 1: Prepare the Data Using Open Source Data Curator
- Collect 100+ hours of teleoperator representative data for your task
- Use Cosmos Curate to processes, analyzes, and organizes video content.
- You will need accurate text+video pairings for the next step.
Step 2: Post-train the Model
- Use training scripts for reference
- Fine-tune on your robot/environment using curated data.
Step 3: Generate Synthetic Scenarios
- Example Prompt:
"Pick up the bruised apple under low light"
- You can also condition with an image
Step 4: Validate with Cosmos Reason
Cosmos Reason is a spatiotemporally aware physical AI reasoning model that critiques synthetic video data based on training data quality standards. Sample evaluation prompts include:
- β Does the robot grasp the apple properly?
- β Are joint angles within safe limits?
- β Any motion artifacts or object collisions?
Cosmos Predict-2 Post-training Samples:
Cosmos-Predict2-14B-Video2World-Sample-GR00T-Dreams-GR1:Video + Text based future visual world generation, post trained on GR00T GR1 data
Cosmos-Predict2-14B-Video2World-Sample-GR00T-Dreams-DROID:Video + Text based future visual world generation, post trained on GR00T DROID data
π§ Try Cosmos Predict-2 Today
Cosmos Predict-2 is now available on Hugging Face. Explore the GitHub repo for detailed setup instructions: nvidia-cosmos/cosmos-predict2
Learn more: NVIDIA Cosmos
Join our community for learning content, livestreams and discussions: NVIDIA Omniverse Community