Reinforcement learning is back, and this time the robots might actually learn something

After years of imitation learning dominance, RL is staging a quiet comeback in robotics. I've seen this pendulum swing before.

24 May 20266 min read

Twelve point three percent. That's the improvement on long-horizon tasks that a new framework called Agentic-VLA claims over existing methods, and honestly, that number stopped me cold when I read it. Not because it's huge (it's not earth-shattering), but because of what it represents: reinforcement learning crawling back from the grave in robotics.

I've been covering tech long enough to watch entire methodologies get declared dead and then resurrect themselves when nobody's looking. Neural networks in the 90s, anyone? So when DeepMind's blog quietly announced that their internal benchmarks show offline RL methods reaching parity with imitation learning, my first thought was: here we go again.

The imitation learning era (and its limits)

For the past few years, the robotics world has been absolutely drunk on imitation learning. The pitch was seductive: why bother with complex reward engineering when you can just show a robot what to do? Collect demonstrations, train a model, deploy. The kids building these systems (and I say that with affection, mostly) grew up in an era where data was cheap and compute was cheaper.

But here's the thing about imitation learning that the hype cycle conveniently forgot to mention: it doesn't generalize well. You train a robot to pick up a red cup in a lab with specific lighting and a specific table height, and then you put it in a warehouse with fluorescent lights and suddenly it's useless. The robot learned to mimic, not to understand.

This isn't a new problem, by the way. I remember writing nearly identical paragraphs about expert systems in the 80s, about how they'd fail the moment you stepped outside their narrow domain. Call me old-fashioned, but I think there's something to be said for systems that actually learn principles rather than just copying homework.

Related coverage

More in AI Models

Chipmakers swung wildly this week, from a Tuesday 'chip-wreck' to a Micron-led surge after hours. What's actually going on with AI's hardware backbone?

Sarah Williams · 26 Jun · 5 min

The original Creator Studio was shut down in 2023. Now it's back, rebuilt around an AI assistant that promises to grow your audience and reply to comments in your voice.

Sarah Williams · 26 Jun · 5 min

At its annual Config conference, Figma announced coding layers, AI-generated motion graphics, and a reimagined canvas that blurs the line between design and full-stack development.

Sarah Williams · 26 Jun · 5 min

Everyone talks about chips and models. The memory bottleneck is the part of the AI buildout that keeps getting underestimated, and Micron's latest earnings make that case hard to ignore.

Reinforcement learning is back, and this time the robots might actually learn something

The imitation learning era (and its limits)

More in AI Models

What Agentic-VLA actually does

Why now?

The skeptic's corner

What this means for the industry

Sources