The Self-Driving Safety Problem Nobody Wants to Talk About
A flood of new research papers promise safer autonomous vehicles through AI wizardry, but we've been here before, and the fundamental problems haven't changed.
Bildnachweis: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
Most coverage of this week's autonomous vehicle research papers will tell you we're on the cusp of something big. That AI is finally cracking the safety problem. That large language models and reinforcement learning are about to make self-driving cars actually safe.
I've seen this movie before.
I covered the first wave of autonomous vehicle hype in the early 2010s, when Google's self-driving car was supposed to be commercially available by 2018. Then I covered the second wave, when Uber and Waymo were going to have robotaxis everywhere by 2020. Now we're in wave three, and the pitches sound remarkably similar, just with more acronyms.
So when five new research papers dropped this week, all promising various flavors of "safer autonomous driving through smarter AI," I figured it was worth actually reading them. What I found was interesting, occasionally impressive, and fundamentally missing the point.
Let's start with what the researchers are actually proposing, because some of this work is genuinely clever.
A team from multiple institutions published SARAD, which tries to solve a real problem: traditional deep reinforcement learning for autonomous driving involves a lot of random exploration, which is a polite way of saying the AI crashes into things while learning not to crash into things. Their solution combines large language models with reinforcement learning, using something called Retrieval-Augmented Generation to guide the AI's decisions based on a "dynamic expert knowledge repository" instead of pure trial and error. They also added a collision predictor trained on historical crash data.
Verwandte Beiträge
More in Autonomy
The IPO everyone's talking about has me asking questions nobody seems to want to answer.
Robert "Bob" Macintosh · 4 hours ago · 3 min
The market's sudden pivot from Iran headlines to tech earnings tells us everything about how seriously investors take the automation thesis.
Mark Kowalski · 7 hours ago · 5 min
After years of voice assistants that made me want to throw my phone out the window, Google's AI might finally be cracking the in-car experience.
Mark Kowalski · 16 hours ago · 5 min
New research shows robots navigating without task-specific training. I've got thoughts.
It's a reasonable approach! The results in their Highway-Env simulator look good. But, and this is important, Highway-Env is a simplified simulation environment. The gap between performing well there and performing well on an actual highway in New Jersey during rush hour in the rain is, well, it remains unclear how big that gap actually is.
Another paper introduced something called Differentiable Model Predictive Safety, or DMPS, which tackles a genuinely hard problem: what happens when you have autonomous vehicles and mobile robots and regular cars and pedestrians all trying to navigate the same unregulated intersection. Their approach lets AI agents predict future trajectories and evaluate risk, then make "minimal and precise online safety corrections."
The headline result is that they reduced collisions to "less than 5.6%" in high-density mixed traffic simulations. Which sounds great until you think about it for more than a few seconds. A 5.6% collision rate in simulation! Call me old-fashioned, but I'd want to see that number a lot closer to zero before anyone puts this on a real street.
The most sobering paper of the bunch is ReasonBreak, which did something unfashionable: instead of showing how well their system works, they showed how easily these AI-powered autonomous vehicles can be broken.
The researchers took NVIDIA's Alpamayo models, which are industry-developed Vision-Language-Action systems, basically the kind of AI that's supposed to see the road, understand what's happening, and decide what to do, and they attacked them with "realistic textual input corruptions." Not sophisticated hacking, just the kind of garbled or misleading inputs that could happen in the real world.
The results should worry anyone paying attention: up to 89% attack success rate on the reasoning component, up to 72% on trajectory manipulation, leading to "increased collision rates and degraded safety metrics."
This is the self-driving car hype cycle all over again. We build impressive demos, we publish papers showing great results in controlled conditions, and then we're surprised when adversarial conditions, which is another way of saying "the real world," break everything.
One paper that caught my attention addresses something I've been wondering about for years: what happens when the communication links between vehicles and infrastructure have delays?
DAROM tackles highway on-ramp merging, which is exactly the kind of scenario where connected vehicles are supposed to shine. A roadside unit sees traffic, processes the data, and tells your car when to merge. Simple in theory. In practice, that communication has latency, sometimes up to 2 seconds according to the paper, and that latency is stochastic, meaning unpredictable.
Their solution uses a "Delay-Aware Encoder" to help the AI figure out what's actually happening now based on delayed information. They tested it using real traffic data from the Next Generation Simulation dataset and got over 99% success rates even with random delays up to 2 seconds.
That's impressive work. But it also highlights how many problems we're still solving at the fundamental level. We're in 2025 and we're still publishing papers about how to handle communication delays in vehicle-to-infrastructure systems. The infrastructure for connected vehicles barely exists outside test corridors. And the papers themselves acknowledge they're working in simulation.
A review paper published this week, A Review of Learning-Based Motion Planning, actually says the quiet part out loud. The authors note that learning-based methods "are often constrained by opacity and safety risks" and that traditional rule-based systems "often fail to generalize in complex scenarios."
In other words: the old approach is too rigid and the new approach is too unpredictable, and we're trying to find some middle ground. The paper proposes a "Data-Driven Optimal Control" paradigm and offers what they call "the first roadmap" for implementing it.
I've been covering tech long enough to be skeptical of roadmaps. The self-driving industry has been five years away from full autonomy for about fifteen years now. But what struck me about this paper is the honesty, the authors identify four future research directions needed "to close the remaining reality gap."
The reality gap. That's the thing. All these papers are doing good work in simulation, in controlled environments, in carefully constructed test scenarios. The gap between that and actual roads with actual humans making actual unpredictable decisions, that gap is enormous and we don't have a clear path across it.
I talk to a lot of autonomous vehicle startups, and most of the founders are too young to remember the first wave of this hype. They weren't around when Google was promising self-driving cars by 2018. They don't remember Uber's fatal crash in Arizona in 2018, or the years of regulatory chaos that followed.
They see these research papers and they see progress. And there is progress! SARAD's approach to reducing unsafe exploration is clever. DMPS's work on heterogeneous traffic coordination addresses real problems. DAROM's handling of communication latency is genuinely useful.
But progress in simulation is not the same as progress on roads. And the adversarial work in ReasonBreak shows just how fragile these systems can be when conditions aren't controlled.
Look, I'm not saying autonomous vehicles will never work. I'm not even saying these papers are bad, most of them represent solid research tackling real problems. What I am saying is that the coverage of this stuff tends to be breathless and optimistic in ways that don't match reality.
We're still working on fundamental problems like communication latency and adversarial robustness. The best results are in simulation, and the simulations are simplified versions of reality. The safety improvements are real but incremental, and a 5.6% collision rate in simulation is not something anyone should be celebrating.
The self-driving car industry has burned through billions of dollars and decades of hype. Maybe this time is different. Maybe large language models and reinforcement learning and differentiable model predictive control will finally crack the code.
But what do I know. I've just been watching this space long enough to recognize the pattern. If you want to argue, my email's on the about page.