画像クレジット: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
Autonomous vehicles have a confidence problem, and I don't mean they lack it. Two papers published this week on arXiv attack the same fundamental issue from different angles: current self-driving systems are often too certain about uncertain futures, and that overconfidence can kill people.
The disconnect between prediction and planning has been a known bottleneck for years. From my time in hardware, I learned that the most dangerous systems aren't the ones that fail obviously. They're the ones that fail while reporting everything is fine. These papers suggest the AV industry is finally taking that lesson seriously.
Most autonomous driving stacks work in stages. First, a perception module identifies objects. Then a prediction module guesses where those objects will go. Finally, a planning module decides what the vehicle should do. The problem is the handoff between prediction and planning.
Current approaches typically do one of two things, and both have issues. Some systems compress all their predictions into a single "most likely" future and plan around that. Others use black-box end-to-end neural networks that skip the interpretable planning step entirely. The first approach ignores uncertainty. The second hides it.
Look, if your prediction module says there's a 60% chance the pedestrian walks forward, a 30% chance they stop, and a 10% chance they dart into traffic, you need a planner that actually reasons about all three scenarios. Most don't.
The first paper, from a team whose affiliations aren't specified in the abstract, proposes what they call "sample-conditioned differentiable planning." The core idea is to use a conditional diffusion model (the same family of models behind image generators like Stable Diffusion) to generate multiple plausible futures rather than picking one.
関連記事
More in Autonomy
European driving data and a novel 'negative space' approach from MIT suggest we've been thinking about city navigation wrong.
James Chen · 7 hours ago · 5 min
New research tackles the boring-but-critical problems of indoor navigation, and I'm quietly impressed.
Robert "Bob" Macintosh · 7 hours ago · 3 min
A library quadruped and a budget LiDAR system both tackle the same problem: knowing when to trust your sensors and when to admit you're lost.
James Chen · 8 hours ago · 5 min
Musk is squeezing bankers on fees, but when you're raising this much money, even crumbs add up to $500 million.
Here's where it gets interesting. Instead of just generating diverse scenarios and hoping the planner figures it out, they feed these samples directly into an optimization process with an explicit risk constraint. They use something called Conditional Value-at-Risk, or CVaR, which focuses specifically on tail risks (the rare but catastrophic outcomes).
The technical contribution that caught my attention is their directed graph representation for scene context. They claim it improves both prediction quality and computational efficiency, though the abstract doesn't provide specific benchmarks. The real test is always production volume, and I'd want to see latency numbers before getting too excited.
They validated on Waymo Open Motion and Argoverse 2 datasets, reporting improvements in safety, efficiency, and ride comfort over state-of-the-art baselines. What those actual numbers are remains unclear from the abstract alone.
The second paper takes a more mathematically conservative route. Their framework, which they call RA-MPC (risk-aware model predictive control), doesn't try to model the true uncertainty distribution at all. Instead, it uses conformal prediction to generate "prediction sets" that provide statistical guarantees without distributional assumptions.
This is, in a way, an admission of ignorance that makes the system safer. Rather than saying "we think there's a 10% chance of X," the system says "we can guarantee with 95% confidence that the outcome will be within this set." The planner then optimizes while respecting those bounds.
The authors extend something called conformal risk control to handle general spectral risk measures. I've seen enough spec sheets to know that "general" often means "theoretically general but practically limited," so I'd want to see which specific risk measures they actually tested.
They validated in simulated vehicle obstacle avoidance scenarios, reporting improved safety and reduced solve time compared to baseline RA-MPC. Simulation is a necessary first step, but it's only that.
Both papers address what I'd call the "interpretability versus capability" tradeoff that's been plaguing autonomous systems. End-to-end neural networks can be powerful but opaque. Traditional modular systems are interpretable but often ignore uncertainty at the interfaces.
These approaches try to have it both ways: use modern machine learning for prediction while maintaining explicit, interpretable optimization for planning. Whether they succeed in practice is too early to say.
The timing isn't coincidental. Regulatory pressure is increasing globally, and regulators want to understand why an autonomous vehicle made a specific decision. "The neural network decided" isn't an acceptable answer when someone gets hurt. Frameworks that can point to explicit risk constraints and mathematically grounded safety guarantees are more likely to satisfy both regulators and insurers.
Neither paper addresses hardware constraints in detail, which is where many promising algorithms die. Real-time planning requires solving these optimization problems in milliseconds, not seconds. The conformal prediction paper mentions "reduced solve time" but doesn't give absolute numbers.
There's also the question of training data. Both approaches depend on learned prediction models, and those models are only as good as the scenarios they've seen. Rare events, by definition, are underrepresented in training data. The CVaR approach explicitly targets tail risks, but if those tails aren't well-characterized in the first place, the guarantees become weaker.
I'd also note that both papers focus on single-vehicle planning. Multi-agent coordination, where your uncertainty about other vehicles compounds with their uncertainty about you, is a harder problem that neither fully addresses.
Still, these represent genuine progress on a real bottleneck. The gap between prediction and planning has been a known issue for years, and it's encouraging to see principled approaches rather than just more end-to-end training. Whether these specific methods scale to production remains to be seen, but the direction is right.