view article Article RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation By Alibaba-DAMO-Academy and 9 others • 29 days ago • 26
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 140
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
OpenX-LeRobot Collection Open X-Embodiment datasets in LeRobot format with standard transfomation (https://github.com/Tavish9/any4lerobot) • 34 items • Updated 12 days ago • 18
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control By danaaubakirova and 3 others • Feb 4 • 173
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution Paper • 2312.06640 • Published Dec 11, 2023 • 49
PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining Paper • 2303.08789 • Published Mar 15, 2023 • 1
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 56
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper • 2411.13807 • Published Nov 21, 2024 • 11
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published Nov 20, 2024 • 19
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 35
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5, 2024 • 27
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published Nov 12, 2024 • 31
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 27