|
--- |
|
tags: |
|
- vision |
|
- 3D |
|
- 3D object detection |
|
datasets: |
|
- omni3d |
|
metrics: |
|
- AP |
|
--- |
|
|
|
# 3D Object Detection with Cube R-CNN |
|
|
|
3D Object Detection with Cube R-CNN is described in [**Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild**](https://arxiv.org/abs/2207.10660) and released in this [repository](https://github.com/facebookresearch/omni3d) |
|
|
|
## Overview |
|
A description of the model and its architecture are shown below |
|
|
|
<img src="https://s3.amazonaws.com/moonup/production/uploads/1666115971617-634ededbd049354d7ee4b557.png" width=700px/> |
|
|
|
## Training Data |
|
|
|
Cube R-CNN was trained on Omni3D, a large benchmark for 3D object detection in the wild. |
|
|
|
## Demo: Inference on Any Image |
|
|
|
The model detects objects in 3D from a single image. There are 50 distinct object categories including *car, truck, chair, table, cabinet, books, and many more*. |
|
The model assumes known focal length for the image in order to predict the right metric scale. |
|
However, users can provide any focal length and will get predictions on a "relative" scale. |
|
|
|
For example, we can predict 3D objects from COCO images with a user-defined focal length of 4.0, as shown below |
|
|
|
<img src="https://github.com/facebookresearch/omni3d/blob/main/.github/generalization_coco.png?raw=true" width=500px/> |
|
|
|
The above output is produced by our demo |
|
|
|
```bash |
|
python demo/demo.py \ |
|
--config cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \ |
|
--input-folder "datasets/image_inputs" \ |
|
--threshold 0.25 --focal 4.0 --display \ |
|
MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \ |
|
OUTPUT_DIR output/demo |
|
``` |
|
|
|
## Checkpoints |
|
|
|
You can find model checkpoints in the original [model zoo](https://github.com/facebookresearch/omni3d/blob/main/MODEL_ZOO.md). |
|
|
|
## Intended Use and Limitations |
|
|
|
Cube R-CNN is a data-driven method trained on an annotated dataset, Omni3D. The purpose of the project is to advance 3D computer vision and 3D object recognition. The dataset contains a *pedestrian* category, which we acknowledge as a potential issue in the case of unethical applications of our model. |
|
|
|
The limitations of our approach are: erroneous predictions especially for far away objects, mistakes in predicting rotations and depth. Our evaluation reports an analysis for various depths and object sizes to better understand performance. |
|
|