The End of Robot Data Collection as We Know It? Two Papers Suggest Human Videos Might Be Enough

New research shows robots learning manipulation skills directly from watching humans, no expensive teleoperation required. I'm cautiously optimistic, but let's look at what's actually happening here.

10 June 20264 min de lectura

I've been covering humanoid robotics long enough to develop a healthy skepticism about "breakthrough" claims. So when two papers dropped this week suggesting robots can learn dexterous manipulation just by watching human videos, my first instinct was to look for the catch.

Honestly? I'm still looking. But the results are compelling enough that I think we need to talk about what's happening here.

The Data Problem Everyone's Been Dancing Around

Here's the thing about robot learning that doesn't get enough attention: there's basically no data. Language models train on the entire internet. Vision models have billions of images. Robotics? We're scraping together datasets through painstaking teleoperation, where a human operator controls a robot arm for hours to collect maybe a few hundred demonstrations of a single task.

This is why progress has been so uneven. It's not that the algorithms are bad. It's that we're trying to teach robots to interact with the physical world using datasets that would make a 2015 image classifier laugh.

Two new papers from arXiv are attacking this problem from slightly different angles, and both arrive at a similar conclusion: maybe we've been overthinking the embodiment gap.

Ego-Pi: Teaching Robots Through Human Eyes

The first paper, Ego-Pi, builds on Physical Intelligence's π₀.₅ model (which, if you're not following this space closely, is one of the more capable vision-language-action models out there). The core idea is deceptively simple: fine-tune the model on egocentric human video, the kind of first-person footage you'd get from someone wearing a GoPro while cooking or assembling furniture.

Cobertura relacionada

More in Humanoids

The headlines are celebrating a $2.5B humanoid robotics deal. I'd pump the brakes a little.

Mark Kowalski · 25 Jun · 6 min

Sometimes the sources don't pan out. Here's what happened when I tried to write a humanoids story this week and ended up with Samsung deals instead.

Sarah Williams · 25 Jun · 3 min

Diffusion models are getting good at imagining robot movements, but 'imaginable' and 'physically possible' aren't the same thing. Researchers are starting to close that gap.

Sarah Williams · 25 Jun · 6 min

A batch of fresh robotics research tackles the same underlying problem from different angles: robots that can see but don't really understand where things are.

The End of Robot Data Collection as We Know It? Two Papers Suggest Human Videos Might Be Enough

The Data Problem Everyone's Been Dancing Around

Ego-Pi: Teaching Robots Through Human Eyes

More in Humanoids

Dexterous Point Policy: The Numbers That Made Me Pay Attention

What This Means (And What It Doesn't)

The Bigger Picture

Fuentes