VLA Models Keep Failing in Ways We Should've Expected

A batch of new research papers show that vision-language-action models break down in predictable, clusterable ways. Anyone who's deployed industrial robots could've told you this.

By Robert "Bob" Macintosh

1 hour ago4 min de lectura

Crédito de imagen: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Look, here's the thing: I've been watching the VLA hype cycle with a mix of interest and, I'll be honest, some skepticism. These vision-language-action models are supposed to be the future of robot control. Tell a robot what to do in plain English, it figures out the rest. Sounds great on paper.

But a cluster of new research papers dropped this month that confirm what anyone who's spent time on a factory floor already knows. These systems fail. They fail in ways that are predictable if you know where to look. And the benchmarks we've been using to evaluate them have been hiding the problem.

The numbers

Let me walk through what these researchers actually found, because the specifics matter.

A team behind something called FATE-VLA reframed how we test these models. Instead of randomly sampling test scenarios (which is what most benchmarks do), they actively hunted for failures. The results weren't pretty. On NVIDIA's GR00T-N1.6 model, success rate dropped from 64.4% to 34.7% when they started looking for edge cases. That's nearly half the apparent capability evaporating when you stress test it properly.

Another paper, SafeVLA-Bench, looked at something different: what happens when a robot "succeeds" but does it unsafely? Turns out 36 to 56 percent of successful runs in kitchen manipulation tasks violated at least one safety requirement. The robot completed the task, sure, but it applied excessive force, knocked over nearby objects, or put itself into weird self-contact configurations.

When I was at Kuka, we had a term for this. We called it "technically correct but practically useless." A palletizing cell that stacks boxes but occasionally crushes one isn't a working system. It's a liability.

Cobertura relacionada

More in Industrial

New research tackles the speed problem that's kept diffusion planners in the lab. About time.

Robert "Bob" Macintosh · 1 hour ago · 3 min

JetPack 7.2 won't make headlines, but it's the kind of infrastructure work that actually moves industrial robotics forward.

Robert "Bob" Macintosh · 1 hour ago · 3 min

New research shows AI-powered robots can fail in ways we can't see coming, and the industry doesn't have a good answer yet.

Robert "Bob" Macintosh · 2 hours ago · 4 min

A batch of new research papers suggests we're finally cracking the code on getting robot policies out of simulation and onto real hardware without everything falling apart.

VLA Models Keep Failing in Ways We Should've Expected

The numbers

More in Industrial

So what

What this means for deployment

The bigger picture

Fuentes