arxiv:2508.09606

BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots

Published on Aug 13

Authors:

Abstract

BEAVR is a VR teleoperation system for robots that supports real-time control, data recording, and policy learning with low latency and compatibility with various visuomotor policies.

AI-generated summary

BEAVR is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving leq35\,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Face\footnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.

View arXiv page View PDF Project page GitHub 60 Add to collection

Community

Nirav-Madhani

8 days ago

My takeaway about core mapping algorithm of paper

Pseudo-code

# Inputs (known/calibrated once):
# H_RV: 4x4 VR->Robot transform
# H_REE_0: 4x4 initial robot EE pose
# A: optional axis-alignment 4x4 (I if folded into H_RV)
# s_t: translation scale (e.g., 1.5)

# State (set at start):
H_VH_0 = None  # 4x4 hand pose at t=0 (in VR)
q_prev = q_home  # previous joint config for IK seed

while True:
    # 1) Acquire VR landmarks (wrist p_w, index MCP p_idx, pinky MCP p_pky)
    p_w, p_idx, p_pky = get_hand_landmarks()

    # 2) Build hand frame pose H_VH(t)
    u = normalize(p_idx - p_w)
    v = normalize(p_pky - p_w)
    xH = normalize(u - dot(u, v) * v)
    zH = normalize(cross(xH, v))
    yH = cross(zH, xH)
    R_H = stack_columns(xH, yH, zH)
    H_VH_t = make_homog(R_H, p_w)

    if H_VH_0 is None:
        H_VH_0 = H_VH_t
        continue  # need a delta on next tick

    # 3) Relative hand motion in VR
    dH_H = inv(H_VH_0) @ H_VH_t

    # 4) Map to robot frame & compose with initial EE pose
    dH_R = H_RV @ dH_H @ inv(H_RV)

    # 5) Optional axis alignment & scaling
    dH_R = A @ dH_R @ inv(A)
    dH_R[:3, 3] *= s_t

    H_R_EE_target = H_REE_0 @ dH_R

    # 6) Filter & IK
    H_R_EE_target = pose_filter(H_R_EE_target)
    q_target = damped_least_squares_IK(H_R_EE_target, q_seed=q_prev)
    q_target = rate_limit_and_collision_check(q_target, q_prev)

    send_joint_targets(q_target)
    q_prev = q_target

aposadasn

Paper author 8 days ago

Yes, that’s the idea! All the logic can be found under the operator component. You can think of this component as responsible for taking data from the input device and transforming it into something useful for the robot interface. In the VR case, this is creating coordinate systems out of raw key point data, finding a relative transformation matrix, and finding the next position by transforming the last robot coordinate frame.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 3

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.09606 in a Space README.md to link it from this page.