update
Browse files- README.md +63 -2
- camera_serials.json +3 -0
- episode_id_to_path.json +3 -0
- intrinsics.json +3 -0
README.md
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
# DROID Annotations
|
2 |
-
|
3 |
This repo contains additional annotation data for the DROID dataset which we completed after the initial dataset release.
|
4 |
|
5 |
Concretely, it contains the following information:
|
@@ -19,7 +18,60 @@ for a subset of the DROID episodes. Concretely, we provide the following three c
|
|
19 |
- `cam2base_extrinsics.json`: Contains ~36k entries with either the left or right camera calibrated with respect to base.
|
20 |
- `cam2cam_extrinsics.json`: Contains ~90k entries with cam2cam relative poses and camera parameters for all of DROID.
|
21 |
- `cam2base_extrinsic_superset.json`: Contains ~24k unique entries, total ~48k poses for both left and right camera calibrated with respect to the base.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Accessing Annotation Data
|
25 |
|
@@ -35,4 +87,13 @@ import tensorflow as tf
|
|
35 |
episode_paths = tf.io.gfile.glob("gs://gresearch/robotics/droid_raw/1.0.1/*/success/*/*/metadata_*.json")
|
36 |
for p in episode_paths:
|
37 |
episode_id = p[:-5].split("/")[-1].split("_")[-1]
|
38 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# DROID Annotations
|
|
|
2 |
This repo contains additional annotation data for the DROID dataset which we completed after the initial dataset release.
|
3 |
|
4 |
Concretely, it contains the following information:
|
|
|
18 |
- `cam2base_extrinsics.json`: Contains ~36k entries with either the left or right camera calibrated with respect to base.
|
19 |
- `cam2cam_extrinsics.json`: Contains ~90k entries with cam2cam relative poses and camera parameters for all of DROID.
|
20 |
- `cam2base_extrinsic_superset.json`: Contains ~24k unique entries, total ~48k poses for both left and right camera calibrated with respect to the base.
|
21 |
+
These files map episodes' unique ID (see Accessing Annotation Data below) to another dictionary containing metadata (e.g., detection quality metrics, see Appendix G of paper), as well as a map from camera ID to the extrinsics values. Said extrinsics is represented as a 6-element list of floats, indicating the translation and rotation. It can be easily converted into a homogeneous pose matrix:
|
22 |
+
```
|
23 |
+
from scipy.spatial.transform import Rotation as R
|
24 |
+
|
25 |
+
# Assume extrinsics is that 6-element list
|
26 |
+
pos = extrinsics[0:3]
|
27 |
+
rot_mat = R.from_euler("xyz", extracted_extrinsics[3:6]).as_matrix()
|
28 |
+
|
29 |
+
# Make homogenous transformation matrix
|
30 |
+
cam_to_target_extrinsics_matrix = np.eye(4)
|
31 |
+
cam_to_target_extrinsics_matrix[:3, :3] = rot_mat
|
32 |
+
cam_to_target_extrinsics_matrix[:3, 3] = pos
|
33 |
+
```
|
34 |
+
This represents a transformation matrix from the camera's frame to the target frame. Inverting it gets the transformation from target frame to camera frame (which is usually desirable, e.g., if one wants to project a point in the robot frame into the camera frame).
|
35 |
+
|
36 |
+
As the raw DROID video files were recorded on Zed cameras and saved in SVO format, they contain camera intrinsics which can be used in conjunction with the above. For convenience, we have extracted and saved all these annotations to `intrinsics.json` (~72k entries). This `json` has the following format:
|
37 |
+
```
|
38 |
+
<episode ID>:
|
39 |
+
<external camera 1's serial>: [fx, cx, fy, cy for camera 1]
|
40 |
+
<external camera 2's serial>: [fx, cx, fy, cy for camera 2]
|
41 |
+
<wrist camera 1's serial>: [fx, cx, fy, cy for wrist camera]
|
42 |
+
```
|
43 |
+
One can thus convert the list for a particular camera to a projection matrix via the following:
|
44 |
+
```
|
45 |
+
import numpy as np
|
46 |
+
|
47 |
+
# Assume intrinsics is that 4-element list
|
48 |
+
fx, cx, fy, cy = intrinsics
|
49 |
+
intrinsics_matrix = np.array([
|
50 |
+
[fx, 0, cx],
|
51 |
+
[0, fy, cy],
|
52 |
+
[0, 0, 1]
|
53 |
+
])
|
54 |
+
```
|
55 |
+
Note that the intrinsics tend to not change much between episodes, but using the specific values corresponding to a particular episode tends to give the best results.
|
56 |
+
|
57 |
+
## Example Calibration Use Case
|
58 |
+
Using the calibration information, one can project points in the robot's frame into pixel coordinates for the cameras. We will demonstrate how to map the robot gripper position to pixel coordinates for the external cameras with extrinsics in `cam2base_extrinsics.json`, see <TODO> for the full code.
|
59 |
+
```
|
60 |
+
gripper_position_base = <Homogeneous gripper position in the base frame, as gotten from TFDS episode. Shape 4 x 1>
|
61 |
+
cam_to_base_extrinsics_matrix = <extrinsics matrix for some camera>
|
62 |
+
intrinsics_matrix = <intrinsics matrix for that same camera>
|
63 |
+
|
64 |
+
# Invert to get transform from base to camera frame
|
65 |
+
base_to_cam_extrinsics_matrix = np.linalg.inv(cam_to_base_extrinsics_matrix)
|
66 |
+
|
67 |
+
# Transform gripper position to camera frame, then remove homogeneous component
|
68 |
+
robot_gripper_position_cam = base_to_cam_extrinsics_matrix @ gripper_position_base
|
69 |
+
robot_gripper_position_cam = robot_gripper_position_cam[:3] # Now 3 x 1
|
70 |
|
71 |
+
# Project into pixel coordinates
|
72 |
+
pixel_positions = intrinsics_matrix @ robot_gripper_position_cam
|
73 |
+
pixel_positions = pixel_positions[:2] / pixel_positions[2] # Shape 2 x 1 # Done!
|
74 |
+
```
|
75 |
|
76 |
## Accessing Annotation Data
|
77 |
|
|
|
87 |
episode_paths = tf.io.gfile.glob("gs://gresearch/robotics/droid_raw/1.0.1/*/success/*/*/metadata_*.json")
|
88 |
for p in episode_paths:
|
89 |
episode_id = p[:-5].split("/")[-1].split("_")[-1]
|
90 |
+
```
|
91 |
+
|
92 |
+
As using the above annotations requires these episode IDs (but the TFDS dataset only contains paths), we have included `episode_id_to_path.json` for convenience. The below code snippet loads this `json`, then gets the mapping from episode paths to IDs.
|
93 |
+
```
|
94 |
+
import json
|
95 |
+
episode_id_to_path_path = "<path/to/episode_id_to_path.json>"
|
96 |
+
with open(episode_id_to_path_path, "r") as f:
|
97 |
+
episode_id_to_path = json.load(f)
|
98 |
+
episode_path_to_id = {v: k for k, v in episode_id_to_path.items()}
|
99 |
+
```
|
camera_serials.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c8d346c51dcef71248e280e44dcd7985a94433f6911460b31dcc098cab30acc4
|
3 |
+
size 12743876
|
episode_id_to_path.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e88ee7da94a40602cde4aacf22f2b48068f4f582c8ab38cf1888e06162a8085
|
3 |
+
size 7237770
|
intrinsics.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:78c76755b075ae53e74a28c543bb1b185c50aa976458e95fbc9ba880a8cd2d51
|
3 |
+
size 125812944
|