Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation
John Z. Zhang1,2, Maks Sorokin2*, Jan Brüdigam2*, Brandon Hung2*, Stephen Phillips2, Dmitry Yershov2,
Farzad Niroui2, Tong Zhao2, Leonor Fermoselle2, Xinghao Zhu2, Chao Cao2, Duy Ta2,
Tao Pang2, Jiuguang Wang2, Preston Culbertson2,3, Zachary Manchester1, and Simon Le Cléac'h2
1MIT   2RAI Institute   3Cornell
*Equal Contribution
This work was done in part during an internship at the RAI Institute.
Corresponding Email: jzhang3@mit.edu
Abstract

This paper presents a sim-to-real approach that enables legged robots to dynamically manipulate large and heavy objects with whole-body dexterity. Our key insight is that by performing test-time steering of a pre-trained whole-body control policy with a sample-based planner, we can enable these robots to solve a variety of dynamic loco-manipulation tasks. Interestingly, we find our method generalizes to a diverse set of objects and tasks with no additional tuning or training, and can be further enhanced by flexibly adjusting the cost function at test time. We demonstrate the capabilities of our approach through a variety of challenging loco-manipulation tasks on a Spot quadruped robot in the real world, including uprighting a tire heavier than the robot's nominal lifting capacity and dragging a crowd-control barrier larger and taller than the robot itself. Additionally, we show that the same approach can be generalized to humanoid loco-manipulation tasks, such as opening a door and pushing a table, in simulation.

Methods
System Overview Dynamics Comparison

System Overview: Left: our method takes a hierarchical approach that combines a pre-trained whole-body control (WBC) policy (purple) with high-level sample-based MPC (green). The low-level whole-body control policy takes in the current state and desired torso, arm, and leg commands and outputs the joint-level commands for the quadruped or humanoid robot at $50$Hz. The high-level sample-based MPC aims to minimize a task-specific cost function by taking in the current state estimate and solving for the desired torso, arm, and leg commands for the low-level policy at $20$Hz. Right: illustrations comparing standard dynamics rollouts, where the actions $u$ are the joint-level controls for the multi-body dynamics model, and our network-policy-augmented dynamics rollouts, where the actions $a$ are inputs to the low-level locomotion policy.

Spot Loco-Manipulation

Note: the Spot robot has a peak lift capacity of 11kg and a continuous load capacity of 5kg.

G1 Loco-Manipulation
Experimental Analysis
BibTeX
      @article{zhang2026sumo,
        title = {Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation},
        author = {Zhang, John Z. and Sorokin, Maks and Br{\"u}digam, Jan and Hung, Brandon and Phillips, Stephen and Yershov, Dmitry and Niroui, Farzad and Zhao, Tong and Fermoselle, Leonor and Zhu, Xinghao and Cao, Chao and Ta, Duy and Pang, Tao and Wang, Jiuguang and Culbertson, Preston and Manchester, Zachary and Le Cl\'eac'h, Simon},
        journal = {arXiv preprint arXiv:2604.08508},
        year = {2026},
        url = {https://arxiv.org/abs/2604.08508}
      }