The Segmentation Wars Are Getting Interesting, and Faster

Three new papers show we're finally solving the speed problem in 3D perception, and I've got some thoughts on what that means for the warehouse floor.

By Robert "Bob" Macintosh

11 hours ago3 min read

Image credit: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Look, I'll be honest: when I first saw these papers drop, my gut reaction was "finally." For years, the gap between what researchers could do in a lab and what we could actually deploy on a factory floor was, well, embarrassing. These new segmentation approaches might actually change that.

But let me complicate my own optimism here, because nothing in this industry is ever as simple as the abstracts make it sound.

The Speed Problem Was Always the Real Problem

When I was at Kuka, we spent months trying to get a vision system that could segment bins of mixed parts in real time. The algorithms existed. The accuracy was there. But running inference at anything approaching production speed? Forget it. We ended up with a cobbled-together solution using older, faster (and dumber) methods because the fancy stuff just couldn't keep up with cycle times.

That's why SpaCeFormer caught my attention. They're claiming 0.12 to 0.30 seconds per scene for open-vocabulary 3D instance segmentation. That's two to three orders of magnitude faster than the multi-stage pipelines we've been stuck with. For context, the old approach could take hundreds of seconds per scene. Hundreds. You can't run a warehouse like that.

The trick seems to be ditching the proposal-based approach entirely. Instead of generating region proposals and then classifying them (the standard playbook), they're using something called Morton-curve serialization to maintain spatial coherence while predicting masks directly from learned queries. I called my old colleague at Siemens to sanity-check whether this actually works in practice, and his take was cautiously optimistic, though he noted that benchmark performance and real-world deployment are different beasts.

Related coverage

More in Industrial

The industrial robotics giant is betting on Nvidia's AI stack, but the real question is whether physical AI can deliver beyond the demo stage.

James Chen · 7 hours ago · 5 min

A batch of new papers promises real-time diffusion on edge hardware. I've seen enough 'breakthroughs' to know which parts matter.

Robert "Bob" Macintosh · 7 hours ago · 5 min

The Tokyo Stock Exchange wants to make it easier to list actively managed ETFs, and I'm trying to figure out if this matters for the robotics sector or if I'm connecting dots that aren't there.

Sarah Williams · 7 hours ago · 6 min

New research tackles the speed problem that's kept diffusion planners in the lab. About time.

The Speed Problem Was Always the Real Problem

Sources