GR00T Wave: Dual Camera Robotics Foundation Model

Model Overview

GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input.

Key Features

  • Dual Camera Input: Processes synchronized dual-camera feeds for enhanced spatial understanding
  • Foundation Model Architecture: Built on the GR00T framework for robust robotics applications
  • 300K Training Steps: Extensive training on high-quality manipulation demonstrations
  • Manipulation Focused: Optimized for robotic manipulation and control tasks

Model Details

  • Model Type: GR00T Robotics Foundation Model
  • Training Data: SO101 Wave 300K Dual Camera Dataset
  • Architecture: Transformer-based with dual camera encoders
  • Training Steps: 300,000 steps with checkpoints at 150K and 300K
  • Input Modalities: Dual RGB cameras, robot state
  • Output: Robot actions and control commands

Usage

from transformers import AutoModel, AutoTokenizer

# Load the model
model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True)

# Model is ready for robotics inference
# Note: This model requires specialized robotics inference pipeline

Training Configuration

  • Base Model: GR00T N1.5-3B
  • Dataset: SO101 Wave 300K Dual Camera
  • Training Framework: Custom robotics training pipeline
  • Batch Size: Optimized for dual camera inputs
  • Optimization: AdamW with custom learning rate scheduling

Model Files

The repository contains:

  • SafeTensors Model Files:
    • model-00001-of-00002.safetensors (4.7GB)
    • model-00002-of-00002.safetensors (2.4GB)
  • Configuration Files:
    • config.json
    • model.safetensors.index.json
  • Training Checkpoints:
    • checkpoint-150000/ (16GB)
    • checkpoint-300000/ (16GB)
  • Training Metadata:
    • trainer_state.json
    • training_args.bin

Evaluation

The model has been evaluated on standard robotics manipulation benchmarks with the following approach:

  • Evaluation Steps: 150 per checkpoint
  • Trajectory Count: 5 trajectories per evaluation
  • Data Configuration: SO100 dual camera setup
  • Metrics: Success rate, manipulation accuracy, and task completion

Applications

This model is suitable for:

  • Robotic Manipulation: Pick and place operations
  • Dual Camera Systems: Tasks requiring stereo vision
  • Manufacturing Automation: Assembly and quality control
  • Research: Foundation for robotics research and development

Technical Specifications

  • Model Size: ~7.1GB (SafeTensors format)
  • Total Repository Size: ~40GB (including checkpoints)
  • Inference Requirements: GPU with sufficient VRAM for transformer inference
  • Framework Compatibility: Transformers, PyTorch

Installation

# Install required dependencies
pip install transformers torch torchvision
pip install huggingface_hub

# Login to HuggingFace (required for private model)
huggingface-cli login

Limitations

  • Requires specialized robotics inference pipeline
  • Optimized for specific dual camera configurations
  • Performance may vary with different robot platforms
  • Requires adequate computational resources for real-time inference

Model Card

This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input.

Ethical Considerations

This model is designed for robotics research and industrial applications. Users should ensure:

  • Safe deployment in robotics systems
  • Appropriate safety measures for physical robot control
  • Compliance with relevant safety standards
  • Responsible use in manufacturing and research environments

Version History

  • v1.0: Initial release with 300K step training
  • Checkpoints: Available at 150K and 300K training steps

Support

For technical questions and implementation support, please refer to the model documentation and community resources.

Downloads last month
17
Safetensors
Model size
3B params
Tensor type
F32
·
BF16
·
Video Preview
loading