layout	order	permalink	description	title	exclude
page	1	/hgap/	Humanoid Control with a Generalist Planner	Humanoid Control with a Generalist Planner	true

H-GAP: Humanoid Control with
a Generalist Planner

Zhengyao Jiang^*
(UCL)

Yingchen Xu^*
(UCL, FAIR at Meta)

Nolan Wagener
(Georgia Tech)

Yicheng Luo
(UCL)

Michael Janner
(UC Berkeley)


Edward Grefenstette
(UCL)

Tim Rocktäschel
(UCL)

Yuandong Tian
(FAIR at Meta)

✨ ICLR 2024 Spotlight ✨

[Paper] [Code] [Poster] [Twitter]

We present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing.

H-GAP Overview

Left: A VQ-VAE that discretizes continuous state-action trajectories.

Middle: A Transformer that autoregressively models the prior distribution over latent codes, conditioned on the initial state.

Right: Zero-shot adapation to novel tasks via MPC planning with learned Prior Transformer, underscoring H-GAP’s versatility as a generalist model.

Imitation Learning

We train H-GAP on MoCapAct dataset, which contains over 500k rollouts displaying various motion from the CMU MoCap dataset. Starting from the same state, H-GAP with greedy decoding can recover the various behaviours from the reference clips. Note that action noise is added to the final output of H-GAP, so the imitation can't be achieved by just memorisation.

Walking
(CMU-002-01)

Backwards
CMU-041-02)

Long Jumping
(CMU-013-11)

Jumping Jack
(CMU-014-06)

Cart Wheeling
(CMU-049-07)

Turning
(CMU-010-04)

The reference snippets are short, but H-GAP with greedy decoding can continue the behaviours after reference snippets, sometimes forming a closed loop.

Turning

Raise hand

Shifting

Downstream Control

To test H-GAP’s zero-shot control performance as a generalist model, we design a suite of six control tasks: speed, forward, backward, shift left, rotate and jump. H-GAP matches or beats offline RL methods trained individually for each task. It also outperforms MPC with access to true dynamics, showing benefits of learned action space.

H-GAP with MPC planning can achieve sensible performance on a wide range of downstream tasks in a zero-shot fashion. Starting from an initial state that is irrelevant or contradictory to the objective, the agent will have to figure out a proper transition between motor skills. For example, it may start with a forward motion when the task is to move backwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hgap.markdown

hgap.markdown

H-GAP: Humanoid Control with
a Generalist Planner

Zhengyao Jiang^*
(UCL)

Yingchen Xu^*
(UCL, FAIR at Meta)

Nolan Wagener
(Georgia Tech)

Yicheng Luo
(UCL)

Michael Janner
(UC Berkeley)


Edward Grefenstette
(UCL)

Tim Rocktäschel
(UCL)

Yuandong Tian
(FAIR at Meta)

✨ ICLR 2024 Spotlight ✨

H-GAP Overview

Imitation Learning

Walking
(CMU-002-01)

Backwards
CMU-041-02)

Long Jumping
(CMU-013-11)

Jumping Jack
(CMU-014-06)

Cart Wheeling
(CMU-049-07)

Turning
(CMU-010-04)

Turning

Raise hand

Shifting

Downstream Control

Speed

Rotate

Jump

Forward

Shift Left

Backwards

Files

hgap.markdown

Latest commit

History

hgap.markdown

File metadata and controls

H-GAP: Humanoid Control with a Generalist Planner

Zhengyao Jiang*(UCL) Yingchen Xu*(UCL, FAIR at Meta) Nolan Wagener(Georgia Tech) Yicheng Luo(UCL) Michael Janner(UC Berkeley) Edward Grefenstette(UCL) Tim Rocktäschel(UCL) Yuandong Tian(FAIR at Meta)

✨ ICLR 2024 Spotlight ✨

H-GAP Overview

Imitation Learning

Walking (CMU-002-01) BackwardsCMU-041-02) Long Jumping (CMU-013-11) Jumping Jack (CMU-014-06) Cart Wheeling (CMU-049-07) Turning (CMU-010-04)

Turning Raise hand Shifting

Downstream Control

Speed Rotate Jump Forward Shift Left Backwards

H-GAP: Humanoid Control with
a Generalist Planner

Zhengyao Jiang^*
(UCL)

Yingchen Xu^*
(UCL, FAIR at Meta)

Nolan Wagener
(Georgia Tech)

Yicheng Luo
(UCL)

Michael Janner
(UC Berkeley)

Edward Grefenstette
(UCL)

Tim Rocktäschel
(UCL)

Yuandong Tian
(FAIR at Meta)

Walking
(CMU-002-01)

Backwards
CMU-041-02)

Long Jumping
(CMU-013-11)

Jumping Jack
(CMU-014-06)

Cart Wheeling
(CMU-049-07)

Turning
(CMU-010-04)

Turning

Raise hand

Shifting

Speed

Rotate

Jump

Forward

Shift Left

Backwards