gkioxari
/

omni3d

3D object detection

Model card Files Files and versions

omni3d / README.md

gkioxari's picture

Create README

cdd5e0c almost 3 years ago

|

history blame contribute delete

2.31 kB

	---
	tags:
	- vision
	- 3D
	- 3D object detection
	datasets:
	- omni3d
	metrics:
	- AP
	---

	# 3D Object Detection with Cube R-CNN

	3D Object Detection with Cube R-CNN is described in [Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild](https://arxiv.org/abs/2207.10660) and released in this [repository](https://github.com/facebookresearch/omni3d)

	## Overview
	A description of the model and its architecture are shown below

	<img src="https://s3.amazonaws.com/moonup/production/uploads/1666115971617-634ededbd049354d7ee4b557.png" width=700px/>

	## Training Data

	Cube R-CNN was trained on Omni3D, a large benchmark for 3D object detection in the wild.

	## Demo: Inference on Any Image

	The model detects objects in 3D from a single image. There are 50 distinct object categories including car, truck, chair, table, cabinet, books, and many more.
	The model assumes known focal length for the image in order to predict the right metric scale.
	However, users can provide any focal length and will get predictions on a "relative" scale.

	For example, we can predict 3D objects from COCO images with a user-defined focal length of 4.0, as shown below

	<img src="https://github.com/facebookresearch/omni3d/blob/main/.github/generalization_coco.png?raw=true" width=500px/>

	The above output is produced by our demo

	```bash
	python demo/demo.py \
	--config cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
	--input-folder "datasets/image_inputs" \
	--threshold 0.25 --focal 4.0 --display \
	MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \
	OUTPUT_DIR output/demo
	```

	## Checkpoints

	You can find model checkpoints in the original [model zoo](https://github.com/facebookresearch/omni3d/blob/main/MODEL_ZOO.md).

	## Intended Use and Limitations

	Cube R-CNN is a data-driven method trained on an annotated dataset, Omni3D. The purpose of the project is to advance 3D computer vision and 3D object recognition. The dataset contains a pedestrian category, which we acknowledge as a potential issue in the case of unethical applications of our model.

	The limitations of our approach are: erroneous predictions especially for far away objects, mistakes in predicting rotations and depth. Our evaluation reports an analysis for various depths and object sizes to better understand performance.