The Map Problem Nobody's Talking About: When Your Robot's Eyes Disagree With Its Brain

Two new papers tackle the same fundamental issue, robots that can see perfectly fine but still get confused about what they're looking at.

By Mark Kowalski

2 hours ago6 min de lectura

Crédito de imagen: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Why do robots still get lost in places they've been a hundred times?

I've been covering autonomy long enough to remember when the answer was always "better sensors." Then it was "more compute." Then it was "bigger neural nets." Now we're in the foundation model era, and guess what, robots are still getting confused about whether that thing in front of them is a parked car or a large shrub. The sensors got better! The compute got faster! The neural nets got enormous! And yet.

Two papers dropped this week that I think get at something important, something the press releases and demo videos tend to gloss over. The problem isn't perception anymore, not really. The problem is that modern robots have two completely different ways of understanding the world, and those two ways keep contradicting each other.

The two-channel problem

Here's what's happening under the hood of any modern autonomous system worth its salt. You've got your geometric perception stack, the stuff that's been around since the DARPA Grand Challenge days, LiDAR, depth cameras, SLAM algorithms that build maps of physical space. This channel is boring and reliable. It knows where the walls are. It knows you can't drive through a concrete pillar.

Then you've got your foundation model channel, your GPT-4Vs and your Geminis and whatever else the kids are plugging in these days. This channel is exciting and unreliable. It can tell you that's a fire hydrant, not a bollard. It can read street signs. It can understand that the person waving their arms is probably telling you to stop.

The problem, and I've seen this movie before with sensor fusion in the 2010s, is that nobody's figured out what to do when these two channels disagree. The geometric stack says "there's definitely something there." The foundation model says "that's a pedestrian." But what if the foundation model is hallucinating? What if it's confidently wrong, which, call me old-fashioned, but I've noticed these models tend to be?

Cobertura relacionada

More in Autonomy

Two new papers tackle the same old problem I've been griping about since my Kuka days: you can have accurate robot control or fast robot control, but getting both is still a pain.

Robert "Bob" Macintosh · 1 hour ago · 3 min

A flurry of new research papers claim big improvements in robot navigation. Some of it's genuinely clever, some of it's solving problems we created for ourselves.

Robert "Bob" Macintosh · 1 hour ago · 4 min

Two new papers show autonomous vehicle planners getting serious about safety constraints, and honestly it's about time.

Mark Kowalski · 1 hour ago · 4 min

Three new papers tackle the same problem from wildly different angles. The common thread? Making robots actually understand what they're looking at.

The two-channel problem

Fuentes