Depth Estimation Is Getting Smarter, But Let's Talk About What That Actually Means for Industrial Vision
New research tackles the uncertainty problem in monocular depth sensing, and after 12 years of watching vision systems fail in warehouses, I have thoughts.
Crédit photo: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
Three new papers dropped this month on depth estimation and visual SLAM, and honestly, it's the kind of progress that would've saved me a lot of headaches back when I was at Kuka trying to get bin-picking systems to work reliably.
Look, here's the thing. Monocular depth estimation (getting 3D information from a single camera) has been the holy grail for cost-conscious automation for years. Stereo vision works, LiDAR works better, but single cameras are cheap. The problem has always been that neural networks are confident idiots. They'll tell you something is 2.3 meters away with absolute certainty, right up until it's actually 4 meters away and your robot just crashed into a pallet.
The Uncertainty Problem
A team from what appears to be an academic robotics lab has put out work on something called UfM* (Uncertainty from Motion), and I'll be honest, the approach is clever. Instead of running your neural network multiple times to figure out how confident it should be (which eats compute like nobody's business), they compare predictions across consecutive frames using Gaussian mixtures. The numbers they're claiming are impressive: 24-28% better calibration than ensemble methods, using 3% of the energy and running at 30 FPS on an Arm Cortex-A76.
Now, I called my old colleague Hans who still works on vision systems, and his reaction was basically "show me the factory floor results." Which is fair. Academic benchmarks and real-world performance are, in a way, different sports entirely.
Separately, there's PRISM-SLAM, which tackles the scale drift problem that's plagued monocular SLAM forever. If you've ever watched a robot slowly convince itself that a room is 20% larger than it actually is over the course of an hour, you know what I'm talking about. They're using something called a Plücker Ray-Distance Factor (don't ask me to explain the math, I'm semi-retired) to anchor observations in absolute space.
À lire aussi
More in Industrial
While everyone's chasing humanoids, researchers just solved problems that have plagued factory robots for decades.
Robert "Bob" Macintosh · 1 hour ago · 4 min
A batch of new research on robot learning from demonstrations looks impressive on paper, but I've got some questions about what happens when these systems hit a real factory floor.
Robert "Bob" Macintosh · 1 hour ago · 4 min
Two new papers tackle the same old problem I've been watching for decades, and I'll be honest, one of them actually impressed me.
Robert "Bob" Macintosh · 3 hours ago · 4 min
Everyone's excited about video world models and 4D representations, but having spent years actually deploying robots, I see some familiar patterns here.