BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots
Abstract
BEAVR is a VR teleoperation system for robots that supports real-time control, data recording, and policy learning with low latency and compatibility with various visuomotor policies.
BEAVR is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving leq35\,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Face\footnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.
Community
My takeaway about core mapping algorithm of paper
Pseudo-code
# Inputs (known/calibrated once):
# H_RV: 4x4 VR->Robot transform
# H_REE_0: 4x4 initial robot EE pose
# A: optional axis-alignment 4x4 (I if folded into H_RV)
# s_t: translation scale (e.g., 1.5)
# State (set at start):
H_VH_0 = None # 4x4 hand pose at t=0 (in VR)
q_prev = q_home # previous joint config for IK seed
while True:
# 1) Acquire VR landmarks (wrist p_w, index MCP p_idx, pinky MCP p_pky)
p_w, p_idx, p_pky = get_hand_landmarks()
# 2) Build hand frame pose H_VH(t)
u = normalize(p_idx - p_w)
v = normalize(p_pky - p_w)
xH = normalize(u - dot(u, v) * v)
zH = normalize(cross(xH, v))
yH = cross(zH, xH)
R_H = stack_columns(xH, yH, zH)
H_VH_t = make_homog(R_H, p_w)
if H_VH_0 is None:
H_VH_0 = H_VH_t
continue # need a delta on next tick
# 3) Relative hand motion in VR
dH_H = inv(H_VH_0) @ H_VH_t
# 4) Map to robot frame & compose with initial EE pose
dH_R = H_RV @ dH_H @ inv(H_RV)
# 5) Optional axis alignment & scaling
dH_R = A @ dH_R @ inv(A)
dH_R[:3, 3] *= s_t
H_R_EE_target = H_REE_0 @ dH_R
# 6) Filter & IK
H_R_EE_target = pose_filter(H_R_EE_target)
q_target = damped_least_squares_IK(H_R_EE_target, q_seed=q_prev)
q_target = rate_limit_and_collision_check(q_target, q_prev)
send_joint_targets(q_target)
q_prev = q_target
Yes, that’s the idea! All the logic can be found under the operator component. You can think of this component as responsible for taking data from the input device and transforming it into something useful for the robot interface. In the VR case, this is creating coordinate systems out of raw key point data, finding a relative transformation matrix, and finding the next position by transforming the last robot coordinate frame.
Models citing this paper 3
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper