The Grasping Problem: Why Your Robot Still Can't Pick Up a Coffee Mug

Two new papers tackle robotic grasping from opposite directions, and honestly, both approaches reveal how far we still have to go.

9 June 2026読了 4 分

86.4% grasp stability sounds pretty good until you remember that means roughly one in seven attempts, your robot drops whatever it's holding.

I've been thinking about this a lot lately. We've got humanoids that can do backflips, AI models that pass the bar exam, and yet the simple act of picking something up remains genuinely hard for robots. Two papers dropped this week that approach the problem from completely different angles, and I think they're worth examining together because they reveal something interesting about where the field is stuck.

The reconstruction approach

The first paper, GraspFoM from a team working with 3D foundation models, takes what I'd call the "understand the object first" approach. The core insight is that robots often fail at grasping because they're working with incomplete information. You see half a mug, you guess where the handle is, you miss.

GraspFoM uses something called SAM3D (a 3D foundation prior) to build what the researchers call a "shared 3D object latent." Basically, the robot reconstructs a full 3D model of the object from partial observations, then uses that reconstruction to predict grasp poses. The clever bit is that these two tasks (reconstruction and grasp prediction) share the same underlying representation, so improvements in one help the other.

I initially thought this was just adding complexity for complexity's sake. But after reading through the ablation studies, I'm less sure. The reconstruction-aware scorer they introduce does seem to provide grounded geometric cues that improve grasp success. Though I should note, the paper doesn't provide real-world success rates, only simulation benchmarks. That gap always makes me nervous.

More in Humanoids

The headlines are celebrating a $2.5B humanoid robotics deal. I'd pump the brakes a little.

Mark Kowalski · 25 Jun · 6 min

Sometimes the sources don't pan out. Here's what happened when I tried to write a humanoids story this week and ended up with Samsung deals instead.

Sarah Williams · 25 Jun · 3 min

Diffusion models are getting good at imagining robot movements, but 'imaginable' and 'physically possible' aren't the same thing. Researchers are starting to close that gap.

Sarah Williams · 25 Jun · 6 min

A batch of fresh robotics research tackles the same underlying problem from different angles: robots that can see but don't really understand where things are.

The Grasping Problem: Why Your Robot Still Can't Pick Up a Coffee Mug

The reconstruction approach

More in Humanoids

Learning from humans (sort of)

What neither paper solves

The bigger picture

出典