Two New Papers Try to Fix What Imitation Learning Gets Wrong About Robot Planning

FlowMPC and WAM-RL both attack the same core limitation of behavior cloning from different angles. Here's what the research actually shows.

17 June 20269 min de lectura

Picture a robot arm that has watched thousands of demonstrations of a pick-and-place task. It has learned, statistically, what movements tend to follow what observations. And then, at test time, it fails to pick up an object it has seen dozens of times before, because the lighting shifted slightly, or the object landed at an angle just outside the training distribution. This is not a hypothetical. It is the central, persistent problem with imitation learning in robotics, and it is the problem that two new preprints, both posted to arXiv in the last week, are trying to address.

The papers, FlowMPC: Improving Flow Matching policies with World Models and WAM-RL: World-Action Model Reinforcement Learning with Reconstruction Rewards and Online Video SFT, come at this from different directions. FlowMPC asks whether a world model can improve a flow-based policy at test time, without touching the training objective. WAM-RL asks whether reinforcement learning can be injected into the World-Action paradigm to let a robot keep improving through real interaction. Neither paper claims to have solved imitation learning. But together they sketch out where the field is currently pushing.

The problem they are both solving

Behavior cloning, at its core, is supervised learning over demonstrations. You collect expert trajectories, you train a policy to imitate them, and you hope the policy generalizes. The problem is that it often does not, for reasons that are well-understood theoretically. The policy never learns to recover from its own mistakes because it was never trained on its own mistake distributions. This is the DAgger problem, identified by Ross, Gordon, and Bagnell back in 2011, and it has never fully gone away.

Cobertura relacionada

More in Research

TurboMPC and jaxipm tackle the same bottleneck from different angles: getting constrained optimization off the CPU and onto the GPU where the rest of modern robotics already lives.

Aisha Patel · 25 Jun · 8 min

New work on exoskeletons, hybrid supervision, humanoid data collection, and vibrotactile sensing all circle the same bottleneck: getting good demonstration data into dexterous robot hands.

Aisha Patel · 25 Jun · 10 min

A flow-matching framework for cross-embodiment manipulation and a point-cloud feasibility predictor both land this week. One is genuinely novel. The other is incremental but useful.

Aisha Patel · 25 Jun · 10 min

Two New Papers Try to Fix What Imitation Learning Gets Wrong About Robot Planning

The problem they are both solving

More in Research

What is actually new here

The numbers

So what

What I would want to see next

Fuentes