Two New Papers Tackle the Same Old Problem: Teaching Robots What a Door Handle Actually Is
Researchers are getting creative with VR annotation and physics-aware scene graphs, but we've been here before.
By
Two papers dropped on arXiv this week that caught my attention, both trying to solve what I'd call the articulated parts problem. That's the fancy academic way of saying: how do you get a robot to understand that a cabinet door swings, a drawer slides, and a lid flips? When I was at Kuka, we spent months on a packaging line that kept jamming because the robot couldn't reliably grasp hinged lids. We ended up solving it with mechanical constraints and very precise fixturing. Not elegant, but it worked. These researchers are trying to do it the hard way, with perception.
The first paper, from a team that's made their code and data available at arXiv, introduces something they call Geometric Primary Structure (GPS). The idea is to create an abstraction of how parts move, somewhere between the old pose-based methods (which require tons of manual labeling) and the newer affordance-based approaches (which track point motion but tend to produce noisy data). Their clever bit is using consumer VR headsets for annotation. One minute per object sequence, they claim. That's actually pretty good. I remember when we had to manually teach every pick point on the KUKA KR 60, and that was just static geometry.
They collected 41,000 frames across 234 objects in six part classes, then trained a model that takes a single RGB-D image and predicts how the articulated bits move. The results are decent: 73% success rate on manipulation tasks covering 270 initial states across 9 objects, with no domain-specific fine-tuning. Now, 73% sounds low if you're used to industrial automation where we aim for five nines, but for generalizable perception in unstructured environments? That's actually respectable. The question I have is whether this scales. Nine objects is a far cry from a warehouse full of mixed SKUs.
Related coverage
More in Industrial
Two new research projects tackle the sensor integration problem that's plagued force-aware manipulation for years, and I'll be honest, the approaches are clever.
Robert "Bob" Macintosh · 8 hours ago · 4 min
Researchers are finally treating the math behind robot arm movements as what it actually is: a geometry problem, not just an optimization grind.
James Chen · 11 hours ago · 5 min
Everyone's covering the financial circus. I'm more interested in what happens when Optimus gets a war chest.
Robert "Bob" Macintosh · 20 hours ago · 3 min
Everyone's comparing the MacBook Neo to Acer's Swift Air 14, but I'm sitting here wondering why nobody's building affordable compute for the factory floor.
