GR00T Wave: Dual Camera Robotics Foundation Model

Model Overview

GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input.

Key Features

Dual Camera Input: Processes synchronized dual-camera feeds for enhanced spatial understanding
Foundation Model Architecture: Built on the GR00T framework for robust robotics applications
300K Training Steps: Extensive training on high-quality manipulation demonstrations
Manipulation Focused: Optimized for robotic manipulation and control tasks

Model Details

Model Type: GR00T Robotics Foundation Model
Training Data: SO101 Wave 300K Dual Camera Dataset
Architecture: Transformer-based with dual camera encoders
Training Steps: 300,000 steps with checkpoints at 150K and 300K
Input Modalities: Dual RGB cameras, robot state
Output: Robot actions and control commands

Usage

from transformers import AutoModel, AutoTokenizer

# Load the model
model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True)

# Model is ready for robotics inference
# Note: This model requires specialized robotics inference pipeline

Training Configuration

Base Model: GR00T N1.5-3B
Dataset: SO101 Wave 300K Dual Camera
Training Framework: Custom robotics training pipeline
Batch Size: Optimized for dual camera inputs
Optimization: AdamW with custom learning rate scheduling

Model Files

The repository contains:

SafeTensors Model Files:
- model-00001-of-00002.safetensors (4.7GB)
- model-00002-of-00002.safetensors (2.4GB)
Configuration Files:
- config.json
- model.safetensors.index.json
Training Checkpoints:
- checkpoint-150000/ (16GB)
- checkpoint-300000/ (16GB)
Training Metadata:
- trainer_state.json
- training_args.bin

Evaluation

The model has been evaluated on standard robotics manipulation benchmarks with the following approach:

Evaluation Steps: 150 per checkpoint
Trajectory Count: 5 trajectories per evaluation
Data Configuration: SO100 dual camera setup
Metrics: Success rate, manipulation accuracy, and task completion

Applications

This model is suitable for:

Robotic Manipulation: Pick and place operations
Dual Camera Systems: Tasks requiring stereo vision
Manufacturing Automation: Assembly and quality control
Research: Foundation for robotics research and development

Technical Specifications

Model Size: ~7.1GB (SafeTensors format)
Total Repository Size: ~40GB (including checkpoints)
Inference Requirements: GPU with sufficient VRAM for transformer inference
Framework Compatibility: Transformers, PyTorch

Installation

# Install required dependencies
pip install transformers torch torchvision
pip install huggingface_hub

# Login to HuggingFace (required for private model)
huggingface-cli login

Limitations

Requires specialized robotics inference pipeline
Optimized for specific dual camera configurations
Performance may vary with different robot platforms
Requires adequate computational resources for real-time inference

Model Card

This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input.

Ethical Considerations

This model is designed for robotics research and industrial applications. Users should ensure:

Safe deployment in robotics systems
Appropriate safety measures for physical robot control
Compliance with relevant safety standards
Responsible use in manufacturing and research environments

Version History

v1.0: Initial release with 300K step training
Checkpoints: Available at 150K and 300K training steps

Support

For technical questions and implementation support, please refer to the model documentation and community resources.

Downloads last month: 17

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics