Diffusion policies are having a moment, but I've seen this movie before

A wave of papers promises to make robot learning faster, cheaper, and more robust. Some of it might even be true.

5 hours ago6 min de leitura

Crédito da imagem: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

So here's the question everyone in robotics should be asking: are diffusion policies actually the breakthrough we've been waiting for, or is this another case of the field falling in love with a hammer and seeing nails everywhere?

I've been covering tech long enough to remember when neural networks were going to solve everything (they didn't), when deep learning was going to solve everything (closer, but still no), and when transformers were going to solve everything (jury's still out). Now diffusion models, which started life making pretty pictures, have wandered into robotics and everyone's losing their minds. A bunch of recent papers suggest they might actually deserve some of the hype this time, but call me old-fashioned, I want to see the receipts.

The pitch

The basic idea is elegant enough that even I can explain it. Instead of training a robot to output a single action, you train it to generate a whole distribution of possible actions and then denoise your way to something useful. It's borrowed from image generation, where diffusion models learned to turn static into art. The promise is that robots trained this way can handle ambiguity better, generalize to new situations, and learn from messier demonstrations.

A paper from researchers working on something called DIPOLE claims their approach outperforms six baselines by 39.1% on average across 18 simulated and 4 real-world tasks. That's a big number! They're fusing vision and geometry through what they call "modality-wise dropout," which basically means they randomly blind the robot to one input stream during training so it learns to rely on either one. The gains under visual distractors (41.5% improvement) and randomized object placement (15.2%) are the numbers that matter here, because that's where robots actually fail in the real world.

Cobertura relacionada

More in AI Models

The AI company's rapid expansion of access to its vulnerability-finding model raises questions about what changed, and what we still don't know.

Aisha Patel · 1 hour ago · 5 min

The company said Mythos was too risky for public release. Now it's handing out access like conference swag.

Sarah Williams · 1 hour ago · 3 min

A cluster of new research papers suggests we're finally cracking the problem of teaching robots to manipulate objects they've never seen before, though the field still has significant hurdles to clear.

Aisha Patel · 1 hour ago · 8 min

Four recent papers tackle the same fundamental question: how do robots understand what objects are for? The answers are converging in interesting ways.

Diffusion policies are having a moment, but I've seen this movie before

The pitch

More in AI Models

The really interesting stuff

The speed problem

So what

What happens next

Fontes