Self-driving AI explains its decisions. Turns out, it's lying half the time.

New research finds that when autonomous driving models tell you why they're doing something, there's a coin-flip chance they're making it up.

By Sarah Williams

3 hours ago読了 6 分

画像クレジット: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Forty-two point five percent. That's the number that's been stuck in my head since I read a new study on autonomous driving AI this week. It's the "reasoning fidelity" rate for a vision-language-action model called Alpamayo-R1-10B, and what it means is this: when the AI explains why it's making a driving decision, its explanation matches reality less than half the time.

Let me say that differently, because I think it's important. These systems can now generate detailed, confident-sounding explanations for their behavior. They'll tell you they stopped because they saw a pedestrian, or they changed lanes because of an obstacle. And according to researchers who tested 300 inferences across 100 driving scenarios, those explanations are basically a coin flip.

I initially thought this was maybe a niche concern, something for AI safety researchers to worry about in academic papers. But then I started thinking about where we're headed. Autonomous vehicles are increasingly being pitched as trustworthy because they can explain themselves. Regulators are asking for interpretability. Insurance companies want to know why a car did what it did. And now we have evidence that the explanations might be, well, fabricated.

The study, published on arXiv, introduces what the authors call "Chain-of-Causation" analysis. Basically, they're checking whether the AI's stated reasoning actually corresponds to what's happening in the scene. The results are honestly kind of alarming. In one-third of scenarios involving pedestrians, the model missed 94 pedestrians total. Just didn't register them in its reasoning at all, even when they were clearly present.

You might be wondering: okay, but does it still drive safely even if its explanations are wrong? That's where it gets worse. The researchers found only 48.3% mean reasoning-action consistency. More than half of inferences showed low consistency between what the model said it would do and what it actually did. Here's the stat that really got me: in 37.9% of cases where the model claimed it was stopping, it actually continued driving.

More in Autonomy

Two new papers tackle the same fundamental issue: vision-language models for autonomous driving can't actually see the world the way they need to.

Robert "Bob" Macintosh · 3 hours ago · 5 min

New research shows the reasoning that autonomous vehicles give for their actions often doesn't match what they're actually doing.

Sarah Williams · 3 hours ago · 4 min

New research from separate teams identifies why vision-language models struggle with 3D space, but their solutions reveal how far we still have to go.

Aisha Patel · 3 hours ago · 7 min

A Raspberry Pi project for Starlink and solar control might seem niche, but it reveals something important about how we're starting to think about smart systems at the edge.

出典