Crédito da imagem: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
Here's something that should be simple: a humanoid robot picks up a six-kilogram box from the floor and places it on a shelf.
It's the kind of task a warehouse worker does hundreds of times a day without thinking. But for robots, this remains genuinely hard. Two papers published this week on arXiv tackle different pieces of this problem, and honestly, reading them back to back tells you a lot about where humanoid manipulation actually stands.
The first paper, from researchers presenting SplitAdapter, addresses something I initially thought was a solved problem: how do you train a robot in simulation and have it work in the real world when the object it's carrying changes weight?
Turns out, it's messier than I expected.
When a humanoid picks up a 2kg box versus a 6kg box, everything changes. The robot's center of mass shifts. Its joints experience different torques. The timing of its steps needs adjustment. And here's the tricky part: these dynamics interact with each other in ways that are hard to disentangle.
Previous approaches tried to handle this by compressing all the relevant information (object weight, robot dynamics, contact forces) into a single latent representation. The robot would, in theory, figure out what mattered. But the SplitAdapter team found this breaks down under heavy loads. The representation gets muddy, and the robot's performance degrades exactly when you need it most.
Their solution is to factor the problem explicitly. One encoder handles object and load awareness. Another handles robot dynamics. They train these separately with different objectives, then combine them using something called Feature-wise Linear Modulation (basically, a way to let each factor influence the robot's behavior without interfering with the other).
Cobertura relacionada
More in Humanoids
Researchers are finally tackling the boring-but-brutal problem of making robots handle heavy stuff without falling over.
Sarah Williams · 6 hours ago · 5 min
A graph diffusion approach to inverse kinematics and an unsupervised motion retargeting framework both dropped this week, and they're more connected than the coverage suggests.
Aisha Patel · 2 days ago · 8 min
Three separate papers this month show how easy it is to hijack vision-language-action models with adversarial patches and poisoned training data. The robots don't even know they're compromised.
Sarah Williams · 3 days ago · 5 min
Two new research papers suggest the future of robotics isn't full autonomy — it's figuring out when humans should take over, and when they shouldn't.
The results are genuinely interesting. On a 6kg box lifted to 60cm height, SplitAdapter showed the largest improvements over baseline methods. The paper doesn't give exact success rate numbers in the abstract, but they claim "the largest improvements under heavy-load conditions," which suggests the baseline was struggling significantly.
What I'm less sure about: The real-world deployment mentioned is vague. How many trials? What failure modes? The abstract doesn't say, and I haven't had time to dig into the full paper yet.
The second paper takes a completely different approach. LDA-1B is a 1-billion parameter foundation model trained on what the authors call EI-30k, a dataset of over 30,000 hours of human and robot trajectories.
You might be wondering: why does a robot need to watch humans? The argument is that dynamics knowledge transfers. When a human picks up a heavy object, the physics of that interaction (how weight shifts, how balance is maintained) contains information relevant to robots, even if the embodiment is different.
This is a bet that's becoming more common in robotics. Instead of throwing away "low-quality" data that doesn't match your specific robot, you try to extract the useful parts. LDA-1B claims to actually benefit from trajectories that other methods would discard as harmful to training.
The performance claims are substantial. The paper reports outperforming prior methods (they specifically call out π₀.₅) by 21% on contact-rich tasks, 48% on dexterous tasks, and 23% on long-horizon tasks. Those are big numbers, tbh. Though I should note these are simulation benchmarks, and the real-world results aren't broken down as clearly.
One technical detail worth noting: they predict in DINO latent space rather than pixel space. This avoids what they call "redundant appearance modeling," basically, the robot doesn't waste capacity learning that a red box and a blue box are both boxes. It remains unclear how much of the performance gain comes from this architectural choice versus the scale of training data.
Reading both papers, I keep coming back to the same question: how close are we really?
SplitAdapter addresses a real problem (load variation during manipulation) but does so in a fairly constrained setting. Fixed object types, specific height ranges, controlled environments. The gap between "works in the lab" and "works in a warehouse" remains significant.
LDA-1B is more ambitious, but foundation models in robotics have a mixed track record. The field has seen several "this is the one" moments that didn't pan out when tested at scale in unstructured environments.
Neither paper addresses what happens when things go wrong. What if the box is heavier than expected? What if the shelf is at an unexpected angle? Robustness to the unexpected, rather than performance on the expected, is often what separates research demos from deployable systems.
We're in a strange moment for humanoid robots. Companies are announcing partnerships, raising massive rounds, and promising deployments. But the underlying technical challenges, like the ones these papers address, are still active research problems.
I think that's okay, actually. Progress is happening. The SplitAdapter approach to factorized adaptation seems genuinely useful. The LDA-1B work on leveraging heterogeneous data could accelerate training for everyone.
But we should be honest about timelines. These papers are solving pieces of the puzzle, not completing it. A robot that can reliably handle variable loads across variable heights in variable environments, while recovering gracefully from errors, while operating for 8-hour shifts, while being economically viable... that's still years away, at minimum.
I initially thought the humanoid deployment timelines some companies are promising (2025, 2026) were aggressive but plausible. After reading this week's research, I'm less sure. The problems being solved are real, but they're also basic. We're still figuring out how to pick up boxes.