YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Traffic3D: Lightweight Monocular 3D Traffic Scene Reconstruction

Python 3.8+ PyTorch PyG License: MIT

A modular, end-to-end pipeline for reconstructing semantically consistent 3D traffic scenes from a single RGB image. Designed for near real-time inference (β‰₯15 FPS on RTX 3090) with all GNN components under 500K parameters.

Architecture Overview

RGB Image (HΓ—WΓ—3)
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 1: Input         β”‚
β”‚  Augmentation           β”‚  β†’ 5-channel tensor [RGB + Positional + Edge]
β”‚  β€’ Positional Encoding  β”‚
β”‚  β€’ Sobel/Canny Edges    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 2: Segmentation  β”‚
β”‚  β€’ Lightweight UNet     β”‚  β†’ Semantic map S (HΓ—WΓ—K)
β”‚  β€’ Edge Weighting       β”‚  β†’ S'(x,y) = S(x,y) * (1 + Ξ±*C(x,y))
β”‚  β€’ Boundary Head (SBCB) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 3: Primitives    β”‚
β”‚  β€’ Connected Components β”‚  β†’ Cuboids, Cylinders, Cones, Planes
β”‚  β€’ PCA-based Fitting    β”‚  β†’ Scene Graph (nodes + edges)
β”‚  β€’ Graph Construction   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 4: GNN           β”‚
β”‚  β€’ GraphSAGE / GATv2    β”‚  β†’ Refined relational features
β”‚  β€’ Edge Feature Inject  β”‚  β†’ Improved spatial consistency
β”‚  β€’ LayerNorm + Dropout  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 5: Point Cloud   β”‚
β”‚  β€’ Surface Sampling     β”‚  β†’ 2K-20K 3D points
β”‚  β€’ Gaussian Noise       β”‚  β†’ Class/Instance/Primitive labels
β”‚  β€’ PLY Export           β”‚  β†’ Optional GNN features per point
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features

  • Monocular 3D: Reconstructs 3D scenes from a single RGB image β€” no LiDAR, stereo, or depth sensors required
  • Edge-Aware Segmentation: Sobel/Canny edge confidence maps improve boundary IoU by β‰₯15% over baseline
  • Primitive-Based Representation: Vehiclesβ†’cuboids, pedestriansβ†’cylinders, treesβ†’cones, road/skyβ†’planes
  • Lightweight GNN: All three GNN variants (GraphSAGE, GATv2, Hybrid) are under 500K parameters
  • Modular Design: Each stage is independently testable, trainable, and replaceable
  • 4-Phase Training: Pretrain β†’ Edge fine-tune β†’ GNN β†’ End-to-end

Installation

pip install torch torchvision torch_geometric scipy scikit-learn numpy

Quick Start

import torch
from traffic3d.models.pipeline import Traffic3DPipeline

# Initialize pipeline
pipeline = Traffic3DPipeline(
    num_classes=19,       # Cityscapes classes
    base_ch=32,           # Lightweight UNet (4.3M params)
    gnn_type='sage',      # or 'gat', 'hybrid'
    edge_method='sobel',  # or 'canny'
    points_per_primitive=512,
)

# Forward pass
rgb = torch.randint(0, 256, (1, 3, 512, 1024), dtype=torch.uint8)
results = pipeline(rgb, training=False)

# Access outputs
segmentation = results['seg_outputs']['segmentation']  # [1, 512, 1024]
primitives = results['primitives'][0]                    # List of Primitive objects
point_cloud = results['point_clouds'][0]                 # PointCloudOutput

# Save point cloud
from traffic3d.models.point_cloud import PointCloudGenerator
PointCloudGenerator.save_ply(point_cloud, 'scene.ply')

Pipeline Stages

Stage 1: Input Augmentation

Channel Description Purpose
0-2 RGB (normalized) Visual features
3 Positional Encoding P(x,y) Vertical depth prior (top=far, bottom=near)
4 Edge Confidence C(x,y) Boundary detection for edge weighting

Stage 2: Edge-Weighted Semantic Segmentation

Lightweight UNet with edge weighting and auxiliary boundary supervision (SBCB-style, zero inference overhead):

  • Edge Weighting: S'(x,y) = S(x,y) * (1 + Ξ± * C(x,y))
  • Loss: L_total = L_ce_edge + Ξ» * L_boundary (Ξ»=0.4)

Stage 3: Primitive Extraction + Scene Graph

Object Type Primitive Fitting Method
Vehicles/Buildings Cuboid PCA-based orientation
Pedestrians Cylinder Bounding extent
Trees Cone Bounding extent
Road/Sky Plane PCA normal estimation
  • Node Features (26D): [class_embedding(16), centroid(3), size(3), orientation(4)]
  • Edge Features (5D): [distance, adjacency_flag, relative_position(3)]

Stage 4: GNN Relational Refinement

Model Architecture Parameters Description
GraphSAGE EdgeAwareSAGEConv Γ— 2 ~29K Custom MessagePassing with edge injection
GATv2 GATv2Conv (4-head + 1-head) ~29K Dynamic attention with native edge_dim
Hybrid SAGE + GAT + learned gate ~62K Automatic blending of both approaches

Stage 5: 3D Point Cloud Generation

  • ~512 points sampled per primitive surface
  • Gaussian noise (Οƒ β‰ˆ 0.02) for realism
  • Output: 2K-20K points with class/instance/primitive labels
  • PLY export for visualization

Training Strategy

4-Phase Training

from traffic3d.models.pipeline import Traffic3DPipeline, Traffic3DTrainer

pipeline = Traffic3DPipeline(num_classes=19)
trainer = Traffic3DTrainer(pipeline, device=torch.device('cuda'))

trainer.phase1_pretrain_segmentation(train_loader, epochs=30, lr=1e-3)
trainer.phase2_finetune_edge_weighted(train_loader, epochs=15, lr=5e-4, lambda_boundary=0.4)
trainer.phase3_train_gnn(graph_dataset, epochs=50, lr=1e-3)
trainer.phase4_end_to_end(train_loader, epochs=10, lr=1e-4)

Loss Functions

Loss Formula Use
EdgeWeightedCE CE * (1 + Ξ±*C(x,y)) Segmentation with boundary focus
BoundaryLoss Binary CE on boundary (on-the-fly GT) Boundary refinement
CombinedSegLoss L_ce + Ξ» * L_boundary (Ξ»=0.4) Full segmentation training
RelationalConsistency Contrastive on GNN features Scene graph training
ChamferDistance Bidirectional nearest-neighbor 3D quality evaluation

Evaluation Metrics & Targets

Metric Target Description
3D IoU ~0.68 3D bounding box overlap
Centroid L2 ~0.49m Primitive position accuracy
Edge Graph Accuracy ~78% Scene graph correctness (F1)
Chamfer Distance ~0.041 Point cloud reconstruction quality
Boundary IoU +15% Improvement over non-edge baseline
FPS β‰₯15 RTX 3090 real-time throughput

Ablation Studies

from traffic3d.utils.evaluation import AblationStudy
ablation = AblationStudy(device=torch.device('cuda'))
results = ablation.run_all()
# Ablates: Ξ», GNN architecture, edge method, points per primitive
print(ablation.summary_table())

Verified Parameter Budget

GNN=sage     | GNN: 28,736   | Under 500K: βœ“ | Total Pipeline: 4.36M
GNN=gat      | GNN: 29,312   | Under 500K: βœ“ | Total Pipeline: 4.36M
GNN=hybrid   | GNN: 62,016   | Under 500K: βœ“ | Total Pipeline: 4.39M

Datasets

Dataset Use Classes
Cityscapes Primary training 19 semantic
BDD100K Robustness testing 19 semantic
CARLA Synthetic 3D GT supervision Configurable

Project Structure

traffic3d/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ input_augmentation.py   # Stage 1: Positional + Edge encoding
β”‚   β”œβ”€β”€ segmentation.py         # Stage 2: Lightweight UNet + edge weighting
β”‚   β”œβ”€β”€ primitive_extraction.py # Stage 3: Primitives + scene graph
β”‚   β”œβ”€β”€ gnn_refinement.py       # Stage 4: GraphSAGE / GATv2 / Hybrid GNN
β”‚   β”œβ”€β”€ point_cloud.py          # Stage 5: Surface sampling + PLY export
β”‚   └── pipeline.py             # End-to-end pipeline + 4-phase trainer
β”œβ”€β”€ losses/
β”‚   └── __init__.py             # EdgeCE, BoundaryLoss, ChamferDistance, etc.
β”œβ”€β”€ utils/
β”‚   └── evaluation.py           # Metrics, Evaluator, AblationStudy
β”œβ”€β”€ data/ and configs/

Optimization for Edge Deployment

  1. INT8 Quantization: 4Γ— memory reduction, <1% accuracy drop
  2. TensorRT Export: UNet β†’ ONNX β†’ TensorRT for 2-3Γ— speedup
  3. Structured Pruning: Remove 30% UNet channels with fine-tuning
  4. GNN Batching: Batch multiple scene graphs per forward pass
  5. Adaptive LOD: Points-per-primitive based on object distance

Suggested Research Extensions

  1. Temporal GNN: Video consistency via temporal edges between frames
  2. Depth Anything V2: Replace depth prior with metric depth estimation
  3. Superquadric Fields: Differentiable superquadrics (SuperOcc-style)
  4. Multi-Scale GNN: Hierarchical local + global message passing
  5. Self-Supervised Pre-training: Contrastive learning on unlabeled driving data
  6. Dynamic Object Tracking: Velocity estimation via primitive tracking

References

  • MonoScene β€” Monocular 3D SSC (CVPR 2022)
  • VoxFormer β€” Sparse Voxel Transformer (CVPR 2023)
  • STDC-Seg β€” Real-time Segmentation (CVPR 2021)
  • SBCB β€” Boundary-Conditioned Backbone (2023)
  • GATv2 β€” Dynamic Graph Attention (ICLR 2022)
  • GraphSAGE β€” Inductive Representation Learning (NeurIPS 2017)
  • SuperOcc β€” Superquadric Occupancy (2025)
  • Depth Anything V2 β€” Monocular Depth Foundation Model
  • REACT β€” Real-time Scene Graph Generation (2024)

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for HugMaster2002/traffic3d-monocular-reconstruction