VLA Models Are Getting Faster, But Nobody's Talking About the Real Problem

Six new papers on vision-language-action inference speed landed this month. Most coverage missed what actually matters for factory floors.

By Robert "Bob" Macintosh

3 hours ago3 min read

Image credit: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

Most of the coverage I've seen on these new VLA papers focuses on the benchmark numbers. Success rates up, latency down, everyone's happy. But look, here's the thing: I spent 12 years watching promising lab demos die the moment they hit a production environment, and I'm seeing the same patterns here.

Let me back up. Vision-language-action models are the hot thing in robotics right now. You show a robot a camera feed, give it a natural language instruction, and it figures out what to do. The promise is obvious. The reality is messier.

The Inference Problem Nobody Wants to Admit

Six papers crossed my desk this month, all tackling the same core issue: VLA models are too slow for real-time control. arXiv published one on threading optimization for agricultural manipulation. Another from the same source tackles weight-aware grasping. There's work on deformable object manipulation, on-device planning, and two papers on something called action chunking (PACE and Mixture of Horizons).

All solid work. But here's what nobody's saying out loud: these solutions are mostly about hiding latency, not eliminating it.

Action chunking, for instance, predicts a sequence of future actions and executes them open-loop before checking back with the model. It's clever. When I was at Kuka, we did something similar with trajectory pre-computation on welding cells, though we didn't have fancy names for it. The PACE paper admits that success is "strongly task-dependent and non-monotonic with respect to the execution horizon." Translation: sometimes it works, sometimes it doesn't, and you won't know which until you try.

Related coverage

More in Industrial

New research tackles the speed problem that's kept diffusion planners in the lab. About time.

Robert "Bob" Macintosh · 1 hour ago · 3 min

JetPack 7.2 won't make headlines, but it's the kind of infrastructure work that actually moves industrial robotics forward.

Robert "Bob" Macintosh · 1 hour ago · 3 min

A batch of new research papers show that vision-language-action models break down in predictable, clusterable ways. Anyone who's deployed industrial robots could've told you this.

Robert "Bob" Macintosh · 1 hour ago · 4 min

New research shows AI-powered robots can fail in ways we can't see coming, and the industry doesn't have a good answer yet.

The Inference Problem Nobody Wants to Admit

Sources