Two New POMDP Solvers Promise Faster Robot Planning Under Uncertainty, But the Gap to Real-World Deployment Remains Wide
ROP-RAS3 and VOPP represent genuine algorithmic progress for partially observable planning, though the robotics community should temper its excitement until we see more diverse benchmarks.
Image credit: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
Planning under uncertainty is one of the genuinely hard problems in robotics. I say this not as hyperbole but as someone who spent years watching graduate students struggle with the computational intractability of Partially Observable Markov Decision Processes. So when two papers arrive within weeks of each other, both claiming substantial improvements to online POMDP solving, it is worth paying attention. It is also worth being precise about what they actually achieve.
The papers in question are ROP-RAS3, which extends work from ISRR24, and VOPP, a vectorized approach that exploits modern GPU parallelism. Both tackle the same fundamental challenge: how do you plan effectively when you cannot observe the full state of the world, when your actions have uncertain outcomes, and when you need to think many steps ahead? The short answer from both papers is that you can do much better than current methods if you are clever about sampling and parallelization. The longer answer, as always, involves caveats.
For readers unfamiliar with the formalism, a POMDP extends the standard Markov Decision Process by acknowledging that agents rarely have perfect information. A robot navigating a cluttered warehouse does not know exactly where every obstacle is. A manipulator grasping an object cannot be certain of its exact pose. The POMDP framework captures this by maintaining a belief state (a probability distribution over possible world states) and updating it as observations arrive.
Related coverage
More in Research
Two new papers show robots learning to feel their way through manipulation tasks, and honestly, this is the kind of boring-but-important work the field needs more of.
Mark Kowalski · 17 hours ago · 5 min
New research suggests robots could maintain orientation awareness with far less sensor data than conventional wisdom demands.
James Chen · 17 hours ago · 4 min
New research on curriculum learning reveals why your favorite humanoid demo probably won't scale to the real world.
James Chen · Yesterday · 5 min
A wave of new research is tackling the boring but critical problem of making robots learn faster and execute reliably. I've seen hype cycles before, but this feels different.
The problem is that solving POMDPs optimally is PSPACE-complete. In practice, this means that as your planning horizon grows, the computational cost explodes. Modern online solvers like POMCP and DESPOT have made progress by using Monte Carlo tree search and clever pruning, but they still struggle with long horizons. To be precise, most benchmarks in the literature use planning horizons of tens to maybe a few hundred steps. Real-world tasks often require thousands.
This is where both new papers claim their contribution.
ROP-RAS3 (Reference-Based Online POMDP Planning via Rapid State Space Sampling) takes an interesting approach to the curse of dimensionality in action spaces. The key insight is that rather than exhaustively enumerating possible actions, which is what most solvers do, you can use sampling-based motion planning techniques to generate a diverse set of macro actions on the fly. These macro actions are sequences of primitive actions that move the system toward sampled goal states.
The paper demonstrates this on problems with up to 3000 lookahead steps and 35-dimensional state spaces. Those numbers are genuinely impressive compared to prior work. The method handles continuous, discrete, and hybrid state-action-observation spaces, which matters because real robots rarely fit neatly into discrete categories.
The theoretical contribution is a convergence guarantee that depends on the number of sampled actions rather than the size of the action space. This is, I think, the most important result in the paper. It means that the method scales with the difficulty of the sampling problem rather than the combinatorial explosion of possible actions.
VOPP takes a different tack. Rather than changing how actions are sampled, it focuses on parallelization. The paper builds on a recent POMDP formulation that analytically solves part of the optimization, leaving only expectation estimation for numerical computation. This is clever because expectation estimation parallelizes beautifully: you can run thousands of Monte Carlo rollouts simultaneously on a GPU without synchronization bottlenecks.
The results are striking. VOPP claims to be at least 20 times more efficient than existing parallel solvers at computing near-optimal solutions. More provocatively, it outperforms state-of-the-art sequential solvers while using a planning budget that is 1000 times smaller. If those numbers hold up, this is a substantial advance.
I know I'm being picky here, but the benchmark situations in both papers warrant scrutiny.
ROP-RAS3's experiments include navigation tasks and manipulation scenarios, which is good. The physical robot demonstration is welcome; too many planning papers never leave simulation. However, the comparison is primarily against DESPOT-based methods, and the success rate improvements (described as "multiple folds") are measured on problems specifically designed to have long horizons. It remains unclear how the method performs on the standard POMDP benchmarks that the community has used for decades. This is not necessarily a criticism, the authors may simply be targeting a different regime, but it makes comparison difficult.
VOPP's efficiency claims are impressive but come with hardware assumptions. The 20x speedup and 1000x budget reduction are measured on GPU hardware that not every robotics lab has access to. The paper is transparent about this, which I appreciate, but it does mean that the practical impact depends on your computational resources. There is also the question of whether the analytical reformulation they exploit applies to all POMDP variants or only a subset. The paper addresses continuous and discrete spaces, but hybrid spaces receive less attention.
Neither paper provides extensive ablation studies on observation noise levels. Real sensors are noisy in complex, non-Gaussian ways. The standard POMDP formulation assumes you know the observation model, but in practice, this model is often wrong. How robust are these methods to model mismatch? We don't know yet.
The optimistic view is that these papers represent genuine algorithmic progress on a fundamental problem. If robots are going to operate autonomously in unstructured environments, they need to plan under uncertainty. Methods that can handle longer horizons and larger state spaces directly expand the envelope of what is possible.
The pessimistic view, or perhaps the realistic one, is that POMDP planning is only one piece of a much larger puzzle. Most deployed robots do not use POMDP solvers. They use simpler reactive policies, learned behaviors, or carefully engineered state machines. The gap between what works in a POMDP benchmark and what works in a factory or home is vast.
I think the truth is somewhere in between. These methods are unlikely to revolutionize deployed robotics in the next year. But they expand the theoretical and empirical understanding of what is computationally tractable. That matters for the field's long-term trajectory.
First, how do these methods compose with learned components? Modern robotics increasingly uses neural networks for perception and sometimes for policy representation. Can ROP-RAS3's macro action sampling work with learned dynamics models? Can VOPP's vectorized planning integrate with neural network value functions? Neither paper addresses this directly.
Second, what about multi-agent settings? Many real-world robotics problems involve multiple robots or robots interacting with humans. The single-agent POMDP formulation does not capture these scenarios well, and it is not obvious how to extend these methods.
Third, and this is perhaps the most important question, what is the sample complexity of deploying these methods on a new task? Both papers assume access to accurate dynamics and observation models. In practice, learning these models from data is often the bottleneck. A method that is 20 times more efficient at planning but requires 100 times more data to learn the model is not necessarily a win.
If I were reviewing follow-up work, here is what I would look for.
For ROP-RAS3: a direct comparison on standard POMDP benchmarks (RockSample, Tiger, etc.) to establish where the method sits relative to existing solvers. Also, experiments with learned dynamics models rather than ground-truth simulators.
For VOPP: ablation studies on observation noise and model mismatch. Also, a clearer characterization of which POMDP variants the analytical reformulation applies to.
For both: deployment on a more diverse set of physical robots. Navigation and manipulation are important, but so are inspection, search and rescue, and collaborative assembly. The field needs to move beyond the same handful of demonstration scenarios.
(As an aside, I am always slightly suspicious when papers report success rates rather than expected cumulative reward. Success rate is easier to interpret but can hide important information about failure modes and near-misses. Both papers do this, which is common in the literature but, actually, the research shows that expected reward is a more informative metric for comparing planners.)
These papers arrive at an interesting moment for robotics planning research. The field has spent the last decade increasingly focused on learning-based methods, sometimes to the exclusion of classical planning. There is a reasonable argument that end-to-end learning will eventually subsume planning entirely. There is also a reasonable counterargument that planning provides guarantees and interpretability that learning cannot.
I do not think either view is entirely correct. The most capable systems will likely combine learned perception and low-level control with principled planning for high-level decision-making. Methods like ROP-RAS3 and VOPP, if they can be integrated with learned components, could form part of that hybrid architecture.
But we are not there yet. The gap between benchmark performance and real-world deployment remains wide. These papers narrow it slightly. That is progress, even if it is incremental.
It is worth noting that both papers make their code available. This is increasingly expected in the robotics research community, but it still deserves acknowledgment. Open code allows others to verify claims, build on results, and identify limitations that the original authors may have missed. ROP-RAS3's code is on GitHub at the RDLLab repository. VOPP's availability is mentioned but the specific repository is not listed in the abstract (I would want to verify this before publishing).
If you work on robot planning under uncertainty, these papers are worth reading carefully. ROP-RAS3 offers a new approach to action sampling that could be useful for long-horizon problems with large action spaces. VOPP offers a parallelization strategy that could dramatically reduce planning time if you have access to appropriate hardware.
If you work on deployed robotics systems, these papers are interesting but probably not immediately actionable. The gap between POMDP benchmarks and real-world deployment involves many challenges that neither paper addresses: sensor noise, model uncertainty, multi-agent coordination, human interaction.
If you are a graduate student looking for research directions, the open questions I listed above are all tractable and important. The intersection of POMDP planning with learned models is particularly underexplored.
Planning under uncertainty has been a hard problem for decades. It will remain hard for decades more. But the difficulty is finite, and these papers chip away at it. That is, in the end, how progress happens in robotics: not through revolutionary breakthroughs but through steady accumulation of better algorithms, better implementations, and better understanding of what works and what does not.
I remain cautiously optimistic. Cautiously, because I have seen too many planning papers that work beautifully in simulation and fail catastrophically on real robots. Optimistic, because the methods here are grounded in solid theory and demonstrate real empirical improvements. The next step is to see whether those improvements survive contact with the physical world.