Reinforcement Learning Is Finally Getting Serious About Robot Navigation

A batch of new papers suggests the field is moving past toy problems, but I've seen this movie before.

10 June 20263 min read

Five papers crossed my desk this week, all on reinforcement learning for robot navigation, and I'll be honest, I almost didn't write this up. We've been hearing about RL solving navigation "any day now" since before I left Kuka. But something feels different this time, and I think it's worth talking about.

The common thread across these papers is hierarchical control. Split the problem into a high-level policy that figures out where to go, and a low-level policy that handles the actual motor commands. This isn't new (we were doing something similar with behaviour trees back in 2015), but the execution has gotten dramatically better.

Take the underwater vehicle work from arXiv. They've got a high-level policy running at 2Hz processing camera and sonar data, generating spatial subgoals for a low-level controller running at 10Hz. The results are within 4% to 6% of RRT* planning baselines. That's not perfect, but for an end-to-end learned system? That's actually pretty good. When I was at Kuka, we would have killed for that kind of performance from a learning system.

The GHOST framework from another team takes this further for manipulation. Their insight is that sub-goals (basically, where should the end-effector be next?) are largely embodiment-agnostic. So you can train the high-level policy on human video demonstrations, then let the low-level policy figure out how to actually execute those goals on whatever robot you've got. I called my old colleague at Siemens about this, and he pointed out this could dramatically reduce. Well, it could reduce a lot of the data collection burden we've been complaining about for years.

The SARM2 paper tackles what I think is the real bottleneck: reward design. They've built a multi-task reward model that can evaluate whether a robot is making progress on long-horizon manipulation tasks. On their benchmark, they went from around 50% success to near-perfect on tasks like folding shorts (58% to 100%) and cleaning whiteboards (50% to 90%). Those are real improvements, though I'd want to see this replicated outside their specific benchmark before getting too excited.

Related coverage

More in Autonomy

A startup called REO says it will sell a pickup truck for $21,500. The price is striking. The evidence for it is less so.

Aisha Patel · 24 Jun · 9 min

Researchers are patching the 'trajectory scoring gap' in sidewalk robots with VLMs and human attention modeling. The ideas are clever. The caveats are real.

Mark Kowalski · 20 Jun · 6 min

Two new papers tackle one of robotics' most stubborn problems: getting a robot to figure out its location using LiDAR, without needing to have visited the place before.

Sarah Williams · 19 Jun · 5 min

The defense tech startup is moving from drones to full autonomous fighters, and it raises questions about where the line between AI autonomy and human oversight actually sits.

Sources