The Quiet Revolution in Robot Planning: Why Gradients Might Finally Be Ready for Prime Time

Two new papers suggest we've been solving the wrong problem in model predictive control. I'm cautiously optimistic, but let me explain why the caveats matter.

27 May 2026読了 7 分

I've been skeptical of gradient-based planning for robots for years. There, I said it. Every time someone pitched me on backpropagating through a learned world model to plan robot actions, I'd nod politely and think about all the ways it tends to fall apart in practice. Gradient-free methods (think evolutionary strategies, random shooting, cross-entropy methods) have dominated real-world robot planning for good reason: they're robust to the messy, non-convex landscapes that learned models create.

But two papers dropped recently that are making me reconsider. Not completely, I should note. I'm not doing a full 180 here. But enough that I think it's worth walking through what's changed and why it matters for anyone building robots that need to plan under uncertainty.

The core problem, if you're not steeped in this stuff: When a robot needs to figure out what to do next, it can either follow a pre-learned policy (fast but inflexible) or plan in real-time using some model of how the world works (flexible but computationally expensive). Model Predictive Control, or MPC, is the planning approach. You simulate forward, see what happens, pick the best action sequence. The question is how you search through possible actions.

Gradient-free methods sample lots of random action sequences and pick the best ones. It's embarrassingly parallel and doesn't care if your model is differentiable. Gradient-based methods, in theory, should be more efficient because they can follow the slope of the objective function directly to good solutions. In practice, they've historically been finicky, getting stuck in local optima or exploding when gradients become unstable.

So what's new?

The first paper, from a team working on what they call Dream-MPC, takes a hybrid approach that I initially thought was just, well, obvious? But the details matter. They generate a small number of candidate trajectories by rolling out a learned policy, then refine each trajectory using gradient ascent through a learned world model. The key innovations are uncertainty regularization (penalizing plans that venture into parts of state space where the model is unreliable) and amortizing optimization across time steps by reusing previously optimized actions.

More in AI Models

Chipmakers swung wildly this week, from a Tuesday 'chip-wreck' to a Micron-led surge after hours. What's actually going on with AI's hardware backbone?

Sarah Williams · 26 Jun · 5 min

The original Creator Studio was shut down in 2023. Now it's back, rebuilt around an AI assistant that promises to grow your audience and reply to comments in your voice.

Sarah Williams · 26 Jun · 5 min

At its annual Config conference, Figma announced coding layers, AI-generated motion graphics, and a reimagined canvas that blurs the line between design and full-stack development.

Sarah Williams · 26 Jun · 5 min

Everyone talks about chips and models. The memory bottleneck is the part of the AI buildout that keeps getting underestimated, and Micron's latest earnings make that case hard to ignore.

出典