RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator

Xinhai Li^1*, Jialin Li^2*, Ziheng Zhang^3†, Rui Zhang⁴, Fan Jia³,
Tiancai Wang³, Haoqiang Fan³, Kuo-Kun Tseng^1‡, Ruiping Wang^2‡,

¹Harbin Institute of Technology, Shenzhen ³MEGVII Technology ⁴Zhejiang University ²Institute of Computing Technology, Chinese Academy of Sciences
^*Equal Contribution ^†Project Lead ^‡Corresponding Author
lixinhai a/t stu.hit.edu.cn {lijialin24s, wangruiping} a/t ict.ac.cn
{zhangziheng, wangtiancai} a/t megvii.com kktseng a/t hit.edu.cn

Github arXiv

RoboGSim is an efficient, low-cost interactive platform with high-fidelity rendering. It achieves demonstration synthesis with novel scenes, novel objects, and novel views, facilitating data scaling for policy learning. Additionally, it can perform the closed-loop simulation for safe, fair and realistic evaluation on different policy models.

Abstract

Efficient acquisition of real-world embodied data has been increasingly critical. However, large-scale demonstrations captured by remote operation tend to take extremely high costs and fail to scale up the data size in an efficient manner. Sampling the episodes under a simulated environment is a promising way for large-scale collection while existing simulators fail to high-fidelity modeling on texture and physics.To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine. RoboGSim mainly includes four parts: Gaussian Reconstructor, Digital Twins Builder, Scene Composer, and Interactive Engine.It can synthesize the simulated data with novel views, objects, trajectories, and scenes. RoboGSim also provides an online, reproducible, and safe evaluation for different manipulation policies. The real2sim and sim2real transfer experiments show a high consistency in the texture and physics. Moreover, the effectiveness of synthetic data is validated under the real-world manipulated tasks. We hope RoboGSim serves as a closed-loop simulator for fair comparison on policy learning.

Pipeline

Overview of the RoboGSim Pipeline: (1) Inputs: multi-view RGB image sequences and MDH parameters of the robotic arm. (2) Gaussian Reconstructor: reconstruct the scene and objects using 3DGS, segment the robotic arm and build an MDH kinematic drive graph structure for accurate arm motion modeling. (3) Digital Twins Builder: perform mesh reconstruction of both the scene and objects, then create a digital twin in Isaac Sim, ensuring high fidelity in simulation. (4) Scene Composer: combine the robotic arm and objects in the simulation, identify optimal test viewpoints using tracking, and render images from new perspectives. (5) Interactive Engine: (i) The synthesized images with novel scenes/views/objects are used for policy learning. (ii) Policy networks can be evaluated in a close-loop manner. (iii) The embodied data can be collected by the VR/Xbox equipment.

Real2Sim Novel Pose Synthesis

Real2Sim Novel Pose Synthesis:

"Real" represents the capture of the real robotic arm from a new viewpoint. "RoboGSim" shows the rendering of the novel pose from the new viewpoint driven by the real recorded trajectory. "Depth" shows the rendering depth by GS. "Diff" is the difference calculated between the Real and the rendered RGB images. We compute the pixel distance of the same point between the Real and RoboGSim, which is 7.37.

Sim2Real Trajectory Replay

Sim2Real Trajectory Replay:

The "Sim" row displays the video sequence collected from Isaac Sim. "Real" represents the demonstration driven by the trajectory in simulation. "RoboGSim" is the GS rendering result driven by the same trajectory. "Diff" indicates the differences between Real and the rendered results.

Novel Scene Synthesis

Novel Scene Synthesis: We show the results of the physical migration of the robot arm to new scenes, including a factory, a shelf, and two outdoor environments. The high-fidelity multi-view renderings demonstrate that RoboGSim enables the robot arm to operate seamlessly across diverse scenes.

RoboGSim as Synthesizer

RoboGSim as Synthesizer: The first two rows show real robot videos captured from the test viewpoint, illustrating successful and failed cases of the VLA model on the Pick task. The last two rows display real robot videos captured from the test viewpoint, showing successful and failed cases of the VLA model on the Place task.

RoboGSim as Evaluator

RoboGSim as Evaluator: The first two rows, labeled "Real" and "RoboGSim", show the footage captured from the real robot and RoboGSim, respectively. They are both driven by the trajectory generated by the same VLA network. In the third row, the left side shows the real-world inference where the robot arm exceeds its operational limits, resulting in a manual shutdown. The right side shows an instance where a wrong decision from the VLA network, causes the robotic arm to collide with the table. The fourth row presents the simulation results from RoboGSim, which can avoid dangerous collisions.

Internship Application

We are recruiting interns in the fields of Embodied Intelligence, Agent, 3DGS, etc. Feel free to apply zhangziheng a/t megvii.com

BibTeX

@misc{li2024robogsimreal2sim2realroboticgaussian,
        title={RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator}, 
        author={Xinhai Li and Jialin Li and Ziheng Zhang and Rui Zhang and Fan Jia and Tiancai Wang and Haoqiang Fan and Kuo-Kun Tseng and Ruiping Wang},
        year={2024},
        eprint={2411.11839},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2411.11839}, 
  }