Can Robots Finally Be Trusted to Follow Rules Under Uncertainty? Two New Papers Make a Serious Attempt
A pair of arXiv papers tackle one of industrial robotics' nastiest unsolved problems: how do you get a robot to obey safety specifications when the world refuses to cooperate?
By
·9 hours ago·8 min de lecture
If you've spent any time around real production hardware, you already know the answer to this question: no, robots cannot be fully trusted to follow complex rules in uncertain environments. Not yet. The gap between what a robot does in simulation and what it does when a conveyor belt vibrates unexpectedly, or a pedestrian cuts across a loading dock at an odd angle, is where deployments go wrong and liability lawyers get busy.
Two papers posted to arXiv in late June are trying to close that gap, using a branch of formal methods called Signal Temporal Logic. They're both worth reading carefully, because they're attacking the same underlying problem from slightly different angles, and together they suggest the field is converging on something that might actually work outside a lab.
Signal Temporal Logic, or STL, is a mathematical framework for expressing temporal safety specifications in a form that can be used for optimization. Instead of writing software rules like "don't hit the wall," you write formal constraints like "the robot must remain at least 0.5 meters from any obstacle for the entire trajectory horizon." STL lets you quantify not just whether a constraint is satisfied, but by how much, which is the "robustness" measure that makes gradient-based optimization possible.
À lire aussi
More in Industrial
The SpaceX debut is being called a 'distribution event' for private markets. For industrial automation companies still waiting on patient capital, the timing matters.
James Chen · 10 hours ago · 5 min
Jeremy Grantham is comparing AI to the dot-com crash. He's not entirely wrong, but the coverage is missing the part that matters most for industrial automation.
James Chen · 22 hours ago · 6 min
A quiet change to FedEx's fuel surcharge structure will push export shipping costs up while easing the burden on imports, and the timing matters for anyone moving industrial hardware overseas.
James Chen · 22 hours ago · 6 min
For industrial automation, this matters enormously. Collaborative robots, autonomous forklifts, inspection drones: all of them need to satisfy layered safety specifications simultaneously. Speed limits near human workers. No-go zones around machinery. Priority rules when a collision is unavoidable. STL is one of the more rigorous tools available for encoding those requirements formally rather than just hoping your control engineer got the if-statements right.
The problem is that existing STL approaches were largely built for deterministic systems. The real world is not deterministic.
The first paper, "pdSTL: Probabilistic Differentiable Signal Temporal Logic for Stochastic Systems," introduces a framework that extends STL to work over belief trajectories rather than point estimates of state. That's a meaningful distinction. Instead of assuming your robot knows exactly where it is and where obstacles are, pdSTL accounts for the full probability distribution over possible states, including sensing noise and stochastic dynamics.
The core technical contribution is twofold. First, the authors use interval-valued probabilistic semantics to compute conservative satisfaction bounds, meaning the framework tells you not just the expected robustness but a lower bound you can actually trust. Second, they reformulate the temporal robustness evaluation as a recurrent, LSTM-style unfolding of STL operators. That's a clever piece of engineering: by structuring the computation like a recurrent neural network, they get linear-time differentiable monitoring, which is what you need for end-to-end trajectory optimization.
I've seen enough spec sheets to be skeptical when papers claim "significantly outperforming" a baseline without showing the full picture, but the validation here is at least honest about scope. They tested on simulated obstacle avoidance and lane-change maneuvers, and then ran real-world experiments on a Crazyflie quadcopter under aerodynamic disturbances. That last part is important. Crazyflie platforms are small and genuinely noisy, and aerodynamic disturbances are exactly the kind of stochastic perturbation that breaks deterministic controllers.
The results show pdSTL maintaining better safety margins than deterministic differentiable STL under those real-world conditions. How much better? The paper quantifies this, though the exact margin depends on the scenario and disturbance level. What's notable is that the formal probabilistic guarantees hold compositionally through the STL syntax tree, meaning the guarantees don't degrade as you add more complex specifications. That's the property you actually need for industrial deployment.
The second paper, "Autonomous Driving with Priority-Ordered STL Specifications Under Multimodal Uncertainty," is tackling a related but distinct problem: what happens when you can't satisfy all your safety specifications at once?
This is a scenario every engineer who has worked on real autonomous systems has confronted. In a safety-critical situation, a robot or vehicle might face a genuine conflict between specifications. Maintain safe following distance. Stay in lane. Don't exceed speed limit. In certain edge cases, you cannot satisfy all three simultaneously. Something has to give. The question is what, and whether the system gives in a principled, predictable way.
The authors propose a lexicographic ordering over STL specifications, which is a formal way of saying "rank your requirements and satisfy the highest-priority ones first, then optimize for lower-priority ones within that constraint." The key contribution is that this priority ordering stays valid under uncertainty, which is non-trivial. If you're uncertain about where other vehicles are going, the set of achievable trajectories changes, and a naive priority scheme can break down.
They implement this using Model Predictive Path Integral (MPPI) control, a sampling-based approach that's well-suited to multimodal uncertainty because it can reason over distributions of possible futures rather than committing to a single prediction. The simulation results on autonomous driving scenarios show the framework handling conflicting objectives under realistic multi-modal uncertainty, including cases where other vehicles have ambiguous intent.
The honest limitation here is that validation is simulation-only. There are no physical hardware experiments in this paper. That's not a fatal flaw for a planning framework paper, but it's worth noting. Simulation scenarios, however realistic, don't fully capture the latency, sensor noise, and mechanical variability of actual vehicles. It's too early to say how the lexicographic priority ordering will perform in edge cases that simulation didn't anticipate.
Let me be direct about what we can and can't conclude from these two papers.
On the pdSTL side, the Crazyflie experiments provide genuine hardware validation. Quadcopters are not industrial manipulators or autonomous forklifts, but they're real physical systems with real uncertainty, and the framework held up. The linear-time complexity of the LSTM-style unfolding means computational cost scales reasonably as trajectory horizons increase, which matters for real-time control.
On the priority-ordered STL side, the MPPI implementation is computationally intensive by nature. MPPI works by sampling thousands of trajectories and computing a weighted average. That's fine on a GPU-equipped autonomous vehicle platform, but it raises questions about deployment on resource-constrained industrial hardware. The paper doesn't directly address this, and it remains unclear whether the approach is practical on embedded controllers with limited compute.
Both papers are using STL as a foundation, which is worth flagging. STL is powerful, but it requires specifications to be written correctly upfront. In industrial deployments, the hard part is often not the optimization algorithm but getting the specifications right in the first place. Garbage in, garbage out, even with formal guarantees.
From my time in hardware, the thing that always frustrated me about academic robotics papers was the gap between "we demonstrated this in simulation" and "we deployed this in a facility with 50 robots running 24 hours a day." That gap is real and it's large.
But the direction these two papers are pointing is genuinely useful for the industry. The problems they're solving, formal safety guarantees under uncertainty, principled handling of conflicting specifications, are exactly the problems that slow down certification of autonomous systems in manufacturing, logistics, and transportation.
Regulatory bodies and insurance underwriters increasingly want formal evidence that autonomous systems will behave safely under uncertainty. "We tested it a lot and it seemed fine" is not going to be sufficient as autonomous systems take on more consequential tasks. Frameworks like pdSTL and priority-ordered STL give you the mathematical structure to make formal arguments about safety, which is what certification processes actually need.
The autonomous vehicle angle in the second paper is perhaps the most immediately commercializable. The autonomous driving industry has been wrestling with conflicting objectives and uncertain predictions for years. A formally grounded approach to prioritizing specifications under multimodal uncertainty is directly applicable to the trajectory planning stacks that companies like Waymo, Zoox, and a dozen others are building.
For industrial robotics specifically, the belief-space approach in pdSTL is more relevant. Collaborative robots operating near humans need to account for uncertainty in human position and intent. Current approaches often use conservative fixed safety zones, which reduces productivity. A system that can formally reason about uncertainty and adjust its safety margins accordingly, while providing probabilistic guarantees, could meaningfully improve throughput without compromising safety.
Look, neither of these papers is ready to be dropped into a production system tomorrow. That's not what arXiv preprints are for. But they represent serious, technically rigorous work on problems that actually matter for the next generation of autonomous systems.
The pdSTL paper is the stronger of the two in terms of empirical validation, simply because it includes real hardware experiments. The priority-ordered STL paper is conceptually important but needs hardware validation to be convincing to engineers who've seen simulation results fall apart in the field.
What's encouraging is that both papers are working within the STL framework rather than proposing entirely new formalisms. That's a sign of a maturing subfield. The community is building on shared foundations rather than reinventing everything from scratch, which is how engineering progress actually happens.
The real test, as always, is production volume and edge cases. Simulations and controlled experiments reveal what researchers anticipated. Deployed systems reveal what they didn't. These papers are a meaningful step toward autonomous robots that can be trusted to follow complex rules under uncertainty. Whether they hold up when the real world throws something genuinely unexpected at them is a question that only deployment will answer.
Less than a month after New Glenn exploded on the pad in Florida, Blue Origin says it's back to work and targeting another launch before the end of 2026.