Two New Papers Want to Teach Robots to Plan Under Uncertainty. I've Seen This Movie Before.
A pair of fresh arXiv papers tackle robot planning in messy, unpredictable environments. The ideas are genuinely interesting. Whether they survive contact with the real world is another question entirely.
By
·Yesterday·7 min de lecture
Picture a robot arm in a warehouse. It reaches for a bin, misses slightly, tries again, and somewhere upstream a planner is frantically updating its model of the world based on what the camera might or might not be seeing accurately. That's not a hypothetical. That's Tuesday for anyone actually deploying robots in the field right now.
Two papers dropped on arXiv this week that both take a swing at the same fundamental problem: how do you get a robot to plan intelligently when it can't fully see what's going on, and when its own actions don't always do what they're supposed to? I've been covering tech long enough to remember when "adaptive systems" was the buzzword that was going to fix everything, and before that "expert systems," and before that... well, you get the idea. So I read these with a mix of genuine interest and the specific skepticism that comes from having watched a lot of promising research papers dissolve on contact with a factory floor.
But let's be fair. These are actually worth your time.
Most robot planning research, if we're being honest, is done under conditions that would make real-world engineers laugh. Clean environments. Predictable physics. Complete sensor data. The robot knows where everything is, and when it moves its arm, the arm goes where it's told.
None of that is true in the real world, and everyone knows it.
The technical term for the messier reality is a Partially Observable Markov Decision Process, or POMDP. It's a mouthful, but the concept is straightforward enough: the robot has incomplete information about its environment (partial observability), and its actions have uncertain outcomes (stochasticity). Building POMDP models that actually capture real-world complexity has historically been, to put it politely, a pain. It requires enormous manual effort from domain experts who have to basically hand-code the robot's understanding of uncertainty into the system.
À lire aussi
More in Autonomy
JPMorgan is bullish on AI stocks again. Mark Kowalski has seen this movie before, and he's not buying the hype just yet.
Mark Kowalski · 6 hours ago · 6 min
A pair of arXiv preprints tackle interpretability in autonomous driving from opposite ends: one shapes how AV systems predict motion, the other judges whether the result was any good.
James Chen · 10 hours ago · 5 min
A new GPU-first framework can train a robot navigation policy faster than you can make coffee. That's impressive. It's also not the whole story.
Mark Kowalski · 10 hours ago · 6 min
A drone landing paper and a Honda-backed HD map dataset both tackle the same stubborn problem: getting AI trained in fake environments to work in real ones.
This is where the first paper, arXiv from a team introducing something called PO-PDDL, comes in. Their idea is to let robots learn these uncertainty models from watching videos of real robot executions, rather than having engineers specify everything by hand. The system watches the robot work, notices when what it expected to happen doesn't match what it actually observed, and uses those inconsistencies to build a model of where the uncertainty lives. The resulting models are expressed in a format (PDDL, which stands for Planning Domain Definition Language) that's compatible with large language models, which is a smart move given where the field is heading.
The results they report on long-horizon manipulation tasks are genuinely encouraging. Their method outperforms existing POMDP and PDDL learning approaches, and the models it produces are apparently reusable across different tasks in the same domain, which matters a lot if you're trying to deploy this stuff at scale.
The second paper tackles a related but distinct problem. Once you have a plan, how do you actually execute it across wildly different real-world scenarios without having to retune everything from scratch every time the environment changes?
The arXiv paper introducing HOLO-MPPI (High-level Offline, Low-level Online MPPI, in case you were wondering) is focused on autonomous driving specifically, which is a domain I've spent a lot of time watching people make promises about. The basic architecture here is a two-level system: a high-level policy learned offline that proposes robust plans in an abstract action space, and a low-level controller that runs online in real time and adapts to whatever chaos the actual world is throwing at it in the moment.
The clever part is how these two levels interact. The high-level policy feeds a sampling distribution into something called Model Predictive Path Integral control, which is a technique for optimizing control sequences by sampling lots of possible trajectories and weighting them by how well they work. The high-level policy basically tells the low-level controller where to look, and the low-level controller handles the fine-grained adaptation. Offline learning for the big picture, online optimization for the details.
Their evaluation across diverse driving scenarios shows improvements over both pure MPPI and end-to-end reinforcement learning baselines, and critically, they claim it maintains real-time control performance. That last part matters enormously. A planning system that's technically superior but too slow to run in real time is basically a very expensive academic exercise.
Fair question, and I'm going to give a genuinely mixed answer.
The core ideas here, combining symbolic planning with learned uncertainty models, or layering offline learning with online optimization, aren't new in isolation. Researchers have been chasing both of these threads for years. What does feel meaningfully different right now is the combination of better foundation models, more accessible robotics hardware, and a community that's finally getting serious about the gap between lab performance and real-world deployment.
The PO-PDDL paper's decision to make their learned models LLM-compatible is a good example of this. Five years ago that wouldn't have mattered. Today, it potentially means the planning representations these systems learn could be inspected, modified, and reasoned about by language models, which opens up a whole category of human-robot interaction that was previously much harder to imagine.
The HOLO-MPPI paper's focus on eliminating per-scenario retuning also addresses one of the genuinely stubborn practical problems in autonomous driving deployment. Anyone who's watched AV companies struggle to generalize their systems from one city to another knows exactly how painful this problem is. Whether HOLO-MPPI's approach scales to the full diversity of real-world driving conditions remains unclear from the paper alone, and the evaluation, while thorough for a research paper, is based on simulated scenarios. That's a significant caveat.
I'll say this: both papers are tackling the right problems. The question I always ask about robotics research is whether the people doing it have spent enough time watching their systems fail in conditions they didn't design for. The best robotics work comes from researchers who have that haunted look in their eyes.
Here's where I'll put my cards on the table. I think the symbolic planning work, the PO-PDDL direction, has more near-term practical relevance for industrial robotics than the HOLO-MPPI work does for autonomous driving. Not because HOLO-MPPI isn't interesting, but because the autonomous driving space has a very specific disease: it's been overpromised to death, and every new technical result gets immediately drafted into someone's investor deck before it's had a chance to prove itself in the field.
Manipulation robotics, on the other hand, is a domain where companies are actively struggling with exactly the kind of uncertainty modeling problem PO-PDDL addresses, and where the bar for "good enough to deploy" is somewhat more tractable than "handle every possible driving scenario at highway speeds."
Both papers are also honest about their limitations, which I always take as a good sign. The PO-PDDL work acknowledges the challenge of scaling their demonstration-driven pipeline, and the HOLO-MPPI paper is upfront about the fact that their instantiation is specific to driving and would need rethinking for other domains. That kind of intellectual honesty is, sort of, rarer than it should be in robotics research.
The deeper issue, and this is based on limited data from two papers rather than a systematic survey of the field, is that the gap between "works in our experiments" and "works reliably in uncontrolled environments" remains the central unsolved problem in robotics. These papers move the needle. Neither of them closes the gap.
I've been watching that gap for a long time. It's getting smaller. Slowly, and with a lot of detours, but smaller. These two papers are a legitimate contribution to that process, and I say that as someone who's read enough robotics papers to be properly suspicious of claims that don't survive contact with reality.
Both papers are available on arXiv now. Read them yourself and form your own opinion. That's what I'd tell any of the young researchers coming up in this space: read the primary sources, not the press releases. And if you want to argue about my take on any of this, my email's on the about page.