Reinforcement Learning Is Finally Getting Serious About Robot Navigation
A batch of new papers suggests the field is moving past toy problems, but I've seen this movie before.
By
Five papers crossed my desk this week, all on reinforcement learning for robot navigation, and I'll be honest, I almost didn't write this up. We've been hearing about RL solving navigation "any day now" since before I left Kuka. But something feels different this time, and I think it's worth talking about.
The common thread across these papers is hierarchical control. Split the problem into a high-level policy that figures out where to go, and a low-level policy that handles the actual motor commands. This isn't new (we were doing something similar with behaviour trees back in 2015), but the execution has gotten dramatically better.
Take the underwater vehicle work from arXiv. They've got a high-level policy running at 2Hz processing camera and sonar data, generating spatial subgoals for a low-level controller running at 10Hz. The results are within 4% to 6% of RRT* planning baselines. That's not perfect, but for an end-to-end learned system? That's actually pretty good. When I was at Kuka, we would have killed for that kind of performance from a learning system.
The GHOST framework from another team takes this further for manipulation. Their insight is that sub-goals (basically, where should the end-effector be next?) are largely embodiment-agnostic. So you can train the high-level policy on human video demonstrations, then let the low-level policy figure out how to actually execute those goals on whatever robot you've got. I called my old colleague at Siemens about this, and he pointed out this could dramatically reduce. Well, it could reduce a lot of the data collection burden we've been complaining about for years.
The SARM2 paper tackles what I think is the real bottleneck: reward design. They've built a multi-task reward model that can evaluate whether a robot is making progress on long-horizon manipulation tasks. On their benchmark, they went from around 50% success to near-perfect on tasks like folding shorts (58% to 100%) and cleaning whiteboards (50% to 90%). Those are real improvements, though I'd want to see this replicated outside their specific benchmark before getting too excited.
Related coverage
More in Autonomy
New research from NASA JPL and university labs shows reinforcement learning can teach rovers to handle loose soil without getting stuck, cutting energy use by 37% on sandy slopes.
James Chen · 4 hours ago · 6 min
I've been burned by EV hype before, but Ford's Skunkworks project is doing something nobody else seems willing to try: making a small, cheap truck.
Mark Kowalski · 9 hours ago · 6 min
Two new papers tackle the geometry problem that's kept cheap, wide-angle cameras from reaching their potential in autonomous systems.
James Chen · 10 hours ago · 5 min
IR-SIM and HA-VLN 2.0 take different approaches to the same challenge, and both reveal how far we still have to go.
