The Robotics Industry Has Rediscovered Imagination, and I've Seen This Before

A wave of 'world model' papers promises robots that can think ahead. It's promising work, but let's not pretend this is the first time we've heard that pitch.

By Mark Kowalski

9 hours ago6 min read

Image credit: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Most of the coverage I've seen this week treats the new crop of vision-language-action world models as some kind of breakthrough moment. Robots that can imagine the future! Predict what happens next! Plan accordingly! The breathless headlines write themselves.

But call me old-fashioned, I've seen this movie before. The idea that robots should build internal models of their environment and use those models to plan ahead isn't new. It's been kicking around since the 1980s, back when I was covering entirely different tech and roboticists were arguing about whether symbolic reasoning or reactive behaviors would win out. Neither did, of course, and now we're back to something that looks suspiciously like the old "world model" concept, just dressed up in neural network formalisms and diffusion models.

That doesn't mean the new work isn't interesting. It is! But let's be precise about what's actually happening here.

What the papers actually show

The past few weeks have seen a genuine cluster of papers exploring variations on a theme: give a robot the ability to "imagine" future states of the world before committing to an action. arXiv published an updated version of RynnVLA-002, which combines a vision-language-action model with a world model that predicts future image states. The claim is that these two components enhance each other, the VLA produces actions, the world model imagines what happens next, and together they achieve a 97.4% success rate on the LIBERO simulation benchmark without pretraining.

That's a strong number, if it holds up. In real-world LeRobot experiments, the integrated world model reportedly boosted success rates by 50%. But here's the thing, we don't know yet how well these results generalize to messier, less controlled environments. The paper is clear about testing in simulation and specific real-world setups, but the gap between "works in the lab" and "works in your warehouse" remains as wide as ever.

Related coverage

More in AI Models

A batch of new papers tackle the computational bottleneck in robot learning, with one approach claiming 4x speedups without sacrificing policy performance.

James Chen · 1 hour ago · 5 min

Six new papers in a week all tackle the same fundamental flaw in robot learning. That's not a coincidence.

James Chen · 1 hour ago · 6 min

Two new papers expose a measurement crisis in vision-language-action models, and if you've been through the self-driving hype cycle, this should sound familiar.

Mark Kowalski · 1 hour ago · 7 min

New benchmark reveals that vision-language-action models grasp objects just fine, but pick the right one at basically random rates.

The Robotics Industry Has Rediscovered Imagination, and I've Seen This Before

What the papers actually show

More in AI Models

The pattern I keep seeing

The data problem nobody wants to talk about

What actually matters here

Sources