YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
EnhancedViTUNet for Front-to-BEV Prediction
This model takes a front-view RGB image and predicts a Bird’s-Eye View (BEV) image.
- Architecture: Vision Transformer (ViT) encoder + U-Net style decoder
- Training: On synthetic Gazebo11 simulation dataset with ROI-masked L1 + perceptual VGG loss
- Input size: 384×384 RGB
- Output size: 384×384 RGB BEV
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support