Stability-Driven Motion Generation for Object-Guided Human-Human Co-Manipulation


Jiahao Xu1   Xiaohan Yuan2   Xingchen Wu1   Chongyang Xu3   Kun Li1   Buzhen Huang1*

1Tianjin University    2National University of Singapore    3Sichuan University
CVPR, 2026

[Paper]
[Supp.]
[Code]



Co-manipulation requires multiple humans to synchronize their motions with a shared object while ensuring reasonable interactions, maintaining natural poses, and preserving stable states. However, most existing motion generation approaches are designed for single-character scenarios or fail to account for payload-induced dynamics. In this work, we propose a flow-matching framework that ensures the generated co-manipulation motions align with the intended goals while maintaining naturalness and effectiveness. Specifically, we first introduce a generative model that derives explicit manipulation strategies from the object's affordance and spatial configuration, which guide the motion flow toward successful manipulation. To improve motion quality, we then design an adversarial interaction prior that promotes natural individual poses and realistic inter-person interactions during co-manipulation. In addition, we also incorporate a stability-driven simulation into the flow matching process, which refines unstable interaction states through sampling-based optimization and directly adjusts the vector field regression to promote more effective manipulation. The experimental results demonstrate that our method achieves higher contact accuracy, lower penetration, and better distributional fidelity compared to state-of-the-art human-object interaction baselines. The code will be made publicly available.

Visual Appearance

Given an object mesh and its trajectory~(green), our method generates coordinated motions that are consistent with the trajectory while remaining natural and physically plausible for co-manipulation.

Method


Overview. Given an input object trajectory, our method generates co-manipulation motions conditioned on object 6D poses and their BPS features (a). To ensure that the motions are consistent with the object trajectory, an affordance-informed manipulation strategy (b) is introduced to produce explicit contact signals as flow guidance. Building on this design, we further propose an adversarial interaction prior (c) and a stability-driven simulation (d) to enhance motion quality.

Input Condition Representations

Three input conditions drive our framework: object geometry via BPS, contact-region priors via an Affordance Map, and explicit flow guidance via the Trajectory & Contact Points.

BPS Encoding

Fixed-length geometric encoding of the object surface via nearest-neighbor distances from sampled basis points.

Affordance Map

Per-point scores (red) highlighting semantically meaningful grasp and support regions on the object surface.

Trajectory & Contact Points

Object trajectory with per-frame contact points(red points), used as explicit conditioning anchors for the flow-matching process.

Qulaitative Results

Our method leverages human appearance, proxemics, and physics to reduce visual ambiguity, resulting in improved performance.

Qulaitative Comparisons

Our method leverages human appearance, proxemics, and physics to reduce visual ambiguity, resulting in improved performance.

Citation

@inproceedings{xu2026stability,
    title={Stability-Driven Motion Generation for Object-Guided Human-Human Co-Manipulation},
    author={Xu, Jiahao and Yuan, Xiaohan and Wu, Xingchen and Xu, Chongyang and Li, Kun and Huang, Buzhen},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026}
    }