The New Trick in Robot Learning: Train Without the Robot

A wave of research papers suggests we might finally crack the robot data problem by ditching robots entirely during training. I've seen this kind of hype before, but this time the numbers are interesting.

By Mark Kowalski

6 hours ago6 min read

Image credit: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Researchers are getting serious about training robots without actually using robots, and the results are good enough that I'm paying attention. A new paper called Phantom demonstrates manipulation policies trained entirely from human video demonstrations, no teleoperation required, achieving up to 92% success rates on real hardware with zero fine-tuning.

This is the kind of claim that would have gotten you laughed out of the room five years ago. Now it's one of half a dozen papers this month pushing the same basic idea: robot data is expensive and hard to get, human data is everywhere, so let's figure out how to bridge the gap.

Why does this matter now?

The robot learning field has been banging its head against the data wall for years. Everyone knows the problem, you need demonstrations to train policies, but collecting robot demonstrations requires having robots, operators, and controlled environments. It doesn't scale. The kids building foundation models for language had the entire internet to work with. Robot researchers have been stuck with whatever they could teleoperate in their own labs.

So the field has been hunting for workarounds. Simulation was supposed to be the answer for a while (and still is, for some applications), but sim-to-real transfer remains finicky. Domain randomization helps but doesn't solve everything. The new approach is more direct: just use human video and figure out how to make it robot-compatible.

Phantom does this by converting human demonstrations into robot observation-action pairs. They estimate hand poses from video, inpaint the human arm out of the frame, and overlay a rendered robot arm instead. The visual domains get aligned without ever touching a real robot during training. It's clever, maybe too clever, but the 92% success rate on deformable object manipulation is hard to argue with.

Related coverage

More in AI Models

Six new papers promise to solve robot training bottlenecks. I've seen this movie before, but this time the approaches are actually interesting.

Mark Kowalski · 5 hours ago · 5 min

The company just raised its outlook by a staggering amount, and honestly, I'm trying to figure out if this is real momentum or a peak we're about to fall off.

Sarah Williams · 6 hours ago · 5 min

A $65 billion raise that eclipses OpenAI. I've seen big valuations before, but this one's got me scratching my head.

Robert "Bob" Macintosh · 6 hours ago · 3 min

The private equity giants are seeking additional investors for what would be one of the largest AI infrastructure financing deals to date.

The New Trick in Robot Learning: Train Without the Robot

Why does this matter now?

More in AI Models

What about the robots we do have?

The vision problem nobody wants to talk about

The VLA models are getting smarter (sort of)

The real-world RL problem

So what does this all mean?

Sources