Bildnachweis: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
I've seen this movie before. Back in the late 2000s, everyone was convinced we were two years away from self-driving cars because the demos looked so good. The cars could navigate a desert course! They could merge onto highways! What nobody talked about was what happened when you tried to make them do both at the same time, or what happened when conditions changed even slightly from the training data. We're watching the same script play out with robot foundation models, and a batch of new research papers this week makes the problem painfully clear.
The headline finding comes from a team studying vision-language-action models, those supposedly general-purpose robot brains that companies keep promising will make robots as flexible as humans. According to arXiv, when these VLA models try to learn new skills from real-world data, they suffer what the researchers call "significant catastrophic forgetting." In plain English: teach a robot to fold towels and it forgets how to pick up boxes. This isn't a minor tuning problem. This is a fundamental limitation that makes the entire premise of general-purpose robots look shaky.
The researchers built a real-world dataset with four sequential manipulation tasks (rigid object pick-and-place, contact-rich pressing, deformable object folding) and found that current approaches just can't handle learning them in sequence without losing previously learned behaviors. They tested experience replay, which is basically making the robot practice old skills while learning new ones, and found it helps but isn't a magic bullet. The paper's contribution is showing exactly which implementation factors matter, which is useful, but call me old-fashioned: I think we're still treating symptoms instead of the disease.
The company just raised its outlook by a staggering amount, and honestly, I'm trying to figure out if this is real momentum or a peak we're about to fall off.
Sarah Williams · 2 hours ago · 5 min
A $65 billion raise that eclipses OpenAI. I've seen big valuations before, but this one's got me scratching my head.
Robert "Bob" Macintosh · 2 hours ago · 3 min
The private equity giants are seeking additional investors for what would be one of the largest AI infrastructure financing deals to date.
James Chen · 3 hours ago · 4 min
The company that once prided itself on vertical integration is outsourcing its AI brain to a competitor. That's not a pivot, it's a concession.
The industry's response to the real-world data problem has been predictable: retreat to simulation. A new paper introduces something called GE-Sim 2.0, a "closed-loop video world simulator" that promises to let robots learn from simulated rollouts instead of expensive real-world demonstrations. According to arXiv, the system was trained on thousands of hours of real robot data and can generate a 25-frame rollout in 2.3 seconds on a single H100 GPU. The researchers claim policies trained against these simulated rollouts "translate into measurable real-world gains."
I want to believe this. I really do. The economics of robot learning basically require simulation to work, because you can't have humans teleoperating robots for millions of hours to generate training data. But "measurable real-world gains" is doing a lot of heavy lifting in that sentence. The paper doesn't say how much gain, or under what conditions, or how the gains hold up when the real world throws something unexpected at the robot. These details matter! The sim-to-real gap has been the graveyard of robotics promises for decades.
There's a related paper on humanoid robots that's more honest about the underlying problem. HumanoidMimicGen, described in another arXiv preprint, is trying to automatically generate training data for humanoid loco-manipulation (walking while manipulating objects, which turns out to be really hard). The researchers found that policies trained with their generated data outperformed those trained only on real-world data by 20%. That's a meaningful number! But read the fine print: this is in simulation, on their own benchmark, and the humanoid action space is so high-dimensional that nobody really knows how well any of this transfers to physical hardware.
One of the more interesting papers this week tackles something most robot AI research conveniently ignores: touch. A team working on dexterous manipulation introduces what they call Center-of-Pressure (CoP), a tactile representation that tries to preserve dense contact information while remaining robust enough for sim-to-real transfer. According to arXiv, policies using this representation achieved zero-shot sim-to-real transfer on a multi-fingered hand for tasks like peg-in-hole insertion and ball balancing.
This is actually impressive if it holds up. The reason most robot learning research ignores touch is that tactile sensors are wildly inconsistent between simulation and reality. The paper claims they solved this with a sensor calibration scheme based on differentiable dynamics, which lets them estimate sensor orientations without ground-truth force measurements. Whether this works outside their specific setup remains unclear, but at least they're attacking a real problem instead of pretending it doesn't exist.
The key findings from this week's research, for those keeping score:
VLA models suffer catastrophic forgetting when learning sequential real-world tasks
Experience replay helps but isn't sufficient, specific implementation factors matter more than the technique itself
New simulation frameworks can generate training data 25 frames in 2.3 seconds, but real-world transfer remains the bottleneck
Synthetic data generation for humanoids shows 20% improvement over real-world-only training in simulation benchmarks
Physics-grounded tactile representations enable zero-shot sim-to-real transfer for specific dexterous tasks
Cluttered scene manipulation remains extremely difficult, with real-world success rates around 50% even for state-of-the-art methods
That last point comes from a paper on extrinsic dexterity, which is the fancy term for using environmental contact to manipulate objects. According to arXiv, their method achieved about 50% success rate across 10 cluttered real-world scenes. Fifty percent! These are researchers who know what they're doing, working in controlled lab conditions, and they're hitting coin-flip odds. The paper frames this as a positive result because it beats prior methods by 25%, which it does, but let's not pretend this is anywhere close to deployment-ready.
There's one more paper worth mentioning, called BORA, which tries to bridge offline learning with online adaptation for dexterous manipulation. The approach uses human-in-the-loop correction during online learning to fix execution errors, which is basically admitting that the robot can't figure it out on its own. According to arXiv, this achieves a 33% absolute increase in success rate over pure imitation learning. That's substantial! But the fact that we need humans constantly intervening to make robot learning work tells you something about where we actually are versus where the press releases say we are.
I've been covering tech long enough to recognize the pattern. We're in the phase where the demos look magical, the benchmarks keep improving, and everyone's convinced the breakthrough is imminent. The young founders raising money on these ideas are smart and hardworking and genuinely believe they're close. Maybe they are! But the catastrophic forgetting problem isn't a bug to be patched. It's a fundamental challenge in how neural networks learn, and the fact that it shows up so clearly in real-world robot learning suggests we're still missing something important about how to build systems that actually generalize.
The researchers studying these problems are doing valuable work. They're being honest about limitations, publishing negative results, and building the foundations that might eventually lead somewhere useful. That's how science is supposed to work. What I'm less patient with is the gap between what the research actually shows and what gets repeated in investor decks and product announcements. A 50% success rate in a lab is not "robots are ready for your warehouse." A 20% improvement in simulation is not "humanoids will replace human workers by 2027."
But what do I know. If you want to argue about it, my email's on the about page.