YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

EnhancedViTUNet for Front-to-BEV Prediction

This model takes a front-view RGB image and predicts a Bird’s-Eye View (BEV) image.

  • Architecture: Vision Transformer (ViT) encoder + U-Net style decoder
  • Training: On synthetic Gazebo11 simulation dataset with ROI-masked L1 + perceptual VGG loss
  • Input size: 384×384 RGB
  • Output size: 384×384 RGB BEV
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support