LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

Abstract

Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.

Video

Highlights

Comprehensive Simulation Suite: LabUtopia introduces a high-fidelity simulation and benchmarking platform tailored for scientific embodied agents, integrating LabSim, LabScene, and LabBench to support complex laboratory tasks.
LabSim: A high-fidelity simulator built on Isaac Sim, enhanced with a chemical engine to model reaction-driven transformations (e.g., color changes, product generation), enabling precise simulation of physical and chemical interactions.
LabScene: A procedural generation pipeline that creates diverse, physically plausible 3D laboratory scenes with over 100 laboratory scenes and 100 instrument assets, verified by domain experts for realistic training environments.
LabBench: A hierarchical benchmark with five levels of task complexity, spanning from atomic manipulations to long-horizon mobile manipulation tasks, covering 30 distinct tasks for rigorous evaluation of embodied agents.

LabSim

LabSim is a high-fidelity simulation environment built on Isaac Sim, enhanced with a chemical engine that models reaction-driven transformations (e.g., color change, product generation). It supports rigid, deformable, and fluid objects, as well as chemical reactions, enabling precise simulation of laboratory phenomena.

LabScene

LabScene is a procedural generation pipeline that synthesizes diverse, physically plausible 3D laboratory scenes. Built upon expert-verified assets, it employs a hybrid layout strategy to create scalable environments for training and evaluating embodied agents.

LabBench

LabBench is a hierarchical benchmark featuring a five-level task structure, from atomic manipulations to long-horizon mobile manipulation tasks. It includes over 50 tasks to evaluate agents’ perception, planning, and control capabilities in realistic laboratory settings.

BibTeX

@article{li2025labutopia,
  author    = {Li, Rui and Hu, Zixuan and Qu, Wenxi and Zhang, Jinouwen and Yin, Zhenfei and Zhang, Sha and Huang, Xuantuo and Wang, Hanqing and Wang, Tai and Pang, Jiangmiao and Ouyang, Wanli and Bai, Lei and Zuo, Wangmeng and Duan, Ling-Yu and Zhou, Dongzhan and Tang, Shixiang},
  title     = {LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents},
  journal   = {arXiv preprint arXiv:2505.22634},
  year      = {2025},
}