In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot manipulation), featuring 55k real-world demonstration trajectories across 279 diverse tasks involving 61 different object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view RGB-D images, proprioceptive robot state information, end effector details, and linguistic task descriptions. We provide a thorough quantitative and qualitative analysis of RoboMIND across multiple dimensions, offering detailed insights into the diversity of our datasets. In our experiments, we conduct extensive real-world testing with four state-of-the-art imitation learning methods, demonstrating that training with RoboMIND data results in a high manipulation success rate and strong generalization.
For the Franka Emika Panda robots, we use cameras positioned at the top, left, and right viewpoints to record the visual information of the task trajectories. For the AgileX/Tien Kung robots, we use their built-in cameras to record visual information. For UR robots, we use an external top camera. All demonstrations are collected using high-quality human teleoperation and stored on a unified intelligence platform.
Dataset Overview. (a) total trajectories categorized by embodiments, (b) trajectory lengths by embodiments, (c) total trajectories grouped by task categories, and (d) total trajectories based on object usage scenarios.
Distribution of objects in RoboMIND, covering most daily life settings: domestic, industrial, kitchen, office, and retail.
Left: A histogram of skill counts across tasks for four embodiments. AgileX tasks typically involve two or three combined skills, extending the task horizon. Meanwhile, Tien Kung tasks vary in length, with some comprising up to five skills per task. Right: We visualize the AX-PutCarrot task with the AgileX robot, which involves three different skills.
Language Description Annotation. We provide refined linguistic annotations for 10,000 successful robot motion trajectories.
Visualization of failed data collection cases. We present two examples of failure from Franka and AgileX. In the FR-PlacePlateInPlateRack task (second row), the Franka arm fails to align with the slot, causing the plate to slip due to operator interference. In the AX-PutCarrot task (fourth row), the AgileX gripper unexpectedly opens, dropping the carrot. These failure cases were filtered out during quality inspection to maintain dataset quality.
We conduct comprehensive experiments employing four popular imitation learning methods, including ACT, BAKU, RDT-1B and OpenVLA on selected RoboMIND tasks to assess their performance and limitations.
FR-PlacePearBowl
FR-SideCloseDrawer
FR-PlaceBluePink
TK-OpenTrashBin
TK-CloseTrashBin
TK-OpenDrawerLowerCabinet
AX-AppleYellowPlate
AX-CarrotGreenPlate
AX-UnpackBowl
UR-CloseTopWhiteDrawer
UR-PickRoundBread
AX-PackPlate
AX-AppleBluePlate
AX-PackBowl
AX-TakePotato
FR-OpenCapLid
FR-PickStrawberryInBowl
FR-SlideCloseDrawer
For more details on data analysis and experiment results, please refer to our paper.
@article{wu2024robomindbenchmarkmultiembodimentintelligence,
title={RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation},
author={Kun Wu and Chengkai Hou and Jiaming Liu and Zhengping Che and Xiaozhu Ju and Zhuqin Yang and Meng Li and Yinuo Zhao and Zhiyuan Xu and Guang Yang and Zhen Zhao and Guangyu Li and Zhao Jin and Lecheng Wang and Jilei Mao and Xinhua Wang and Shichao Fan and Ning Liu and Pei Ren and Qiang Zhang and Yaoxu Lyu and Mengzhen Liu and Jingyang He and Yulin Luo and Zeyu Gao and Chenxuan Li and Chenyang Gu and Yankai Fu and Di Wu and Xingyu Wang and Sixiang Chen and Zhenyu Wang and Pengju An and Siyuan Qian and Shanghang Zhang and Jian Tang},
journal={arXiv preprint arXiv:2412.13877},
year={2024}
}