Three New Papers Point to the Same Conclusion: Humanoid Control Is Becoming a Data Problem

A trio of arXiv papers this week suggests the field is converging on diffusion-based approaches trained on massive motion datasets, but the real bottleneck might not be algorithms.

By James Chen

1 hour ago5 min read

Image credit: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

What happens when three separate research teams, working on the same fundamental problem, arrive at remarkably similar conclusions within the same week?

That's the question I found myself asking after reviewing a cluster of humanoid locomotion papers that hit arXiv over the past few days. MuGen, SCRIPT, and Direct Dynamic Retargeting (DDR) each tackle the challenge of getting humanoid robots to move like humans. And while their specific approaches differ, the convergence in methodology is striking enough to suggest something important about where this field is heading.

What are these papers actually proposing?

Let me break down the three approaches, because the technical details matter here.

arXiv published MuGen, which uses vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning. The key insight is creating a "generative representation of locomotion" from what the authors describe as "hours of heterogeneous human performance data." They employ a teacher-student learning framework with a new policy distillation strategy. The result: a robot that can track and mimic human motions it has never seen before.

SCRIPT, detailed in another arXiv paper, takes a different architectural approach with what the researchers call a Joint Action-State-Text Diffusion Transformer (JAST-DiT). The system represents actions, physical states, and text as separate token streams, then couples them through joint attention. What caught my attention was their training regime: supervised imitation pre-training followed by reinforcement learning with hybrid rewards. They tested on the MotionMillion dataset, which contains 1,200 hours of motion data.

Related coverage

More in Humanoids

Three new papers dropped this week that suggest we've been watching the wrong competition.

Sarah Williams · 1 hour ago · 4 min

Three new papers tackle the same underlying issue: we've been forcing robots into kinematic boxes that don't fit their actual capabilities.

Sarah Williams · 3 hours ago · 6 min

A batch of new papers suggests we've been training robots the wrong way, and the fixes are surprisingly straightforward.

Sarah Williams · 3 hours ago · 6 min

Two new papers tackle robot safety with CBFs. The math is elegant. The gap between theory and messy reality is still enormous.

Three New Papers Point to the Same Conclusion: Humanoid Control Is Becoming a Data Problem

What are these papers actually proposing?

More in Humanoids

What do these papers have in common?

The real bottleneck isn't algorithms

What remains unclear

Why this convergence matters

The skeptic's take

Sources