The New Robot Training Paradigm: Let Video Models Do the Thinking

A wave of research is betting that video generation models, not traditional simulators, will teach robots how to manipulate the physical world.

By Mark Kowalski

Yesterday6 min de lecture

Crédit photo: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

I've been watching robotics researchers chase the sim-to-real dream for the better part of a decade now, and every few years someone announces they've finally cracked it. Usually they haven't. But something different is happening in the labs right now, and I'll admit it's got my attention.

The latest batch of papers coming out of MIT, Stanford, and various AI labs are converging on an idea that would have sounded ridiculous five years ago: instead of building elaborate physics simulators to train robots, just use video generation models. The same technology that makes those weird AI videos of Will Smith eating spaghetti? Turns out it might actually be useful for something.

The core insight is deceptively simple. Video models have absorbed millions of hours of footage showing how objects move, fall, stack, and interact. They've learned intuitive physics not from equations but from watching the world. So why not use that knowledge to train robots?

A system called GE-Sim 2.0 is pushing this idea hard. It's a "closed-loop video world simulator" for robotic manipulation, which basically means it generates plausible video of what would happen if a robot took a particular action, then uses that generated video to train policies. The team retrained their model on thousands of hours of real robot footage (teleoperation, contact-rich interaction, actual policy deployment) and claims it now tops the public WorldArena leaderboard at only 2 billion parameters. That's notably smaller than many general video generators, and the researchers say policies trained against its simulated rollouts actually translate into real-world gains.

Call me old-fashioned, but I remain skeptical of leaderboard claims. We've seen this movie before with self-driving cars, where simulation results looked fantastic right up until the moment they didn't. But the approach itself is interesting.

More in AI Models

The company just raised its outlook by a staggering amount, and honestly, I'm trying to figure out if this is real momentum or a peak we're about to fall off.

Sarah Williams · 2 hours ago · 5 min

A $65 billion raise that eclipses OpenAI. I've seen big valuations before, but this one's got me scratching my head.

Robert "Bob" Macintosh · 2 hours ago · 3 min

The private equity giants are seeking additional investors for what would be one of the largest AI infrastructure financing deals to date.

James Chen · 3 hours ago · 4 min

The company that once prided itself on vertical integration is outsourcing its AI brain to a competitor. That's not a pivot, it's a concession.

Sources