Crédito de imagen: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
A robotic hand hovers over a coffee mug. It needs to grasp the handle, rotate the mug 90 degrees, and set it down without spilling the imaginary contents. For a human, this takes perhaps two seconds of unconscious effort. For the robot, it represents thousands of possible finger placements, force vectors, and contact sequences, most of which end with the mug on the floor.
This week, two papers appeared that attack this problem from different angles. ContactExplorer, from a team working on dexterous hand manipulation, proposes a new way to reward robots for discovering useful contact patterns. MotionDisco takes a broader view, using large language models to guide evolutionary search over whole-body humanoid motions. Both papers claim significant improvements over prior work. Both also, to be precise, reveal just how unsolved dexterous manipulation remains.
Reinforcement learning has gotten very good at certain things. Atari games. Navigation. Locomotion, where the contact patterns are relatively predictable (foot hits ground, foot leaves ground, repeat). But manipulation is different. The space of possible contacts between a five-fingered hand and an arbitrary object is enormous, and most contact configurations are useless or actively harmful.
Existing exploration methods struggle here. Novelty-based approaches that work well for navigation ("go somewhere you haven't been") translate poorly to manipulation, where the relevant novelty isn't spatial position but physical interaction. As the ContactExplorer authors note, contact-based novelty signals tend to be unstable, while distance-based novelty signals are inefficient.
ContactExplorer's solution is to represent contact as the intersection between object surface points and hand keypoints. The system maintains a counter that tracks how frequently each finger interacts with different regions of an object, conditioned on discretized object states. This counter drives two mechanisms: a count-based reward for exploring novel contact patterns, and an energy-based reaching reward that guides the hand toward under-explored contact regions.
Cobertura relacionada
More in Humanoids
Three new papers on humanoid walking and climbing are genuinely interesting, but the coverage so far has been missing the engineering details that actually matter.
Robert "Bob" Macintosh · 7 hours ago · 4 min
Three new papers show robots mastering terrain that would've been science fiction five years ago. But the gap between lab demos and real deployment? That's the part nobody wants to talk about.
Mark Kowalski · 8 hours ago · 6 min
A cluster of research papers all targeting the Unitree G1 reveals where the field is actually heading, and it's not where the hype suggests.
Aisha Patel · 19 hours ago · 8 min
It's worth noting that this is genuinely new, at least in its specific formulation. Prior work on contact-based exploration exists (the authors cite several examples), but the combination of learned hash codes for state discretization with the dual reward structure appears to be novel. The hash code approach is particularly interesting because it sidesteps the need to manually define what counts as a "different" object state.
The experimental results are promising but require careful interpretation. The paper evaluates on a diverse set of dexterous manipulation tasks and reports substantial improvements in sample efficiency and success rates over existing exploration methods. The authors also claim that contact patterns transfer robustly to real hardware.
I have some methodology concerns here. The paper doesn't specify exact sample sizes for the real-world transfer experiments, and "robustly" is doing a lot of work in that sentence. The sim-to-real gap for dexterous manipulation is notoriously brutal, and I'd want to see more detailed failure analysis before drawing strong conclusions about transfer. The project page (contact-explorer.github.io) may have additional details, but based on the abstract alone, this remains somewhat unclear.
That said, the core contribution, using contact coverage as an exploration signal, seems sound. The intuition makes sense: if you want a robot to learn manipulation, you should reward it for trying different ways of touching things, not just for moving to different positions.
MotionDisco operates at a different level of abstraction. Where ContactExplorer focuses on hand-object interaction, MotionDisco tackles whole-body humanoid loco-manipulation: tasks that require coordinated movement and manipulation over long time horizons.
The key claim here is that MotionDisco discovers contact-rich, long-horizon motions "from scratch," without teleoperation or motion retargeting from human demonstrations. This is, actually, the research shows a meaningful departure from the dominant paradigm in humanoid manipulation, which typically relies heavily on human motion data.
The approach couples three components: an LLM-guided evolutionary search over sequences of interactions, a sequential kinodynamic trajectory optimizer, and a pruning strategy to keep the search tractable. The LLM doesn't generate motions directly; instead, it guides the evolutionary search by suggesting promising interaction sequences to explore.
I know I'm being picky here, but the phrase "from scratch" deserves scrutiny. The LLM was trained on vast amounts of human-generated text, including descriptions of physical interactions. The trajectory optimizer encodes assumptions about humanoid kinematics and dynamics. "From scratch" really means "without explicit human motion demonstrations for these specific tasks," which is still valuable but not quite the same thing.
Using large language models to guide robotic skill discovery is having a moment. MotionDisco joins a growing list of papers that leverage LLMs for planning, reward shaping, or search guidance in robotics. The results are often impressive, but the mechanism remains somewhat mysterious.
What does the LLM actually contribute? The paper describes it as guiding evolutionary search, but the abstract doesn't detail how the LLM's suggestions are generated or validated. Is it proposing natural language descriptions of interactions that get translated into search constraints? Is it directly suggesting parameter modifications? The ablation studies apparently show that the LLM guidance matters ("our LLM-guided search discovers successful whole-body trajectories"), but I'd want to understand the counterfactual better. How much worse is pure evolutionary search without LLM guidance?
This isn't a criticism specific to MotionDisco; it's a broader concern about the current wave of LLM-robotics papers. The improvements are real, but we don't have great tools for understanding why they work or when they'll fail.
MotionDisco claims to transfer discovered motions to a real humanoid robot, making it "the first work to discover and deploy long-horizon humanoid loco-manipulation skills entirely through automated evolutionary search." This is a strong claim, and the supplementary video (available on YouTube) presumably demonstrates it.
The transfer pipeline involves training reinforcement learning tracking policies on the discovered trajectories. This is a standard approach, but it introduces another potential failure point. The discovered trajectory might be dynamically feasible in simulation but require tracking precision that the real robot can't achieve. Or the simulation might not capture relevant contact dynamics. Or, well, multiple things.
I haven't watched the supplementary video, so I can't evaluate the real-world results directly. Based on the abstract alone, it's too early to say how robust the transfer actually is. "Long-horizon" could mean many things, and humanoid robots have a way of falling over at inconvenient moments.
Let me try to be precise about the contributions here.
ContactExplorer's novelty lies in its specific formulation of contact coverage as an exploration signal, particularly the use of learned hash codes for state discretization and the dual reward structure. This is incremental over prior work on contact-based exploration, but the increment appears meaningful. The key question is whether the approach generalizes beyond the specific tasks tested.
MotionDisco's novelty is in the combination of LLM-guided evolutionary search with kinodynamic optimization for long-horizon humanoid tasks. The individual components (LLMs for robotics, evolutionary search, trajectory optimization) are established; the integration and application to loco-manipulation is new. The claim about being the first to discover and deploy such skills through automated search is notable if it holds up.
Neither paper represents a paradigm shift. Both represent solid progress on genuinely difficult problems.
Dexterous manipulation remains one of the hardest problems in robotics. These two papers illustrate why.
ContactExplorer shows that even for single-hand manipulation of simple objects, we need sophisticated exploration strategies to learn useful behaviors. The space of possible contacts is too large for naive exploration, and the reward signal from task completion is too sparse. We need inductive biases that capture something about the structure of manipulation.
MotionDisco shows that scaling to whole-body, long-horizon tasks requires even more machinery: not just better exploration, but intelligent search over the space of possible interaction sequences. The combinatorial explosion of possible contact interactions with task horizon and object count is real, and brute-force approaches won't work.
Both papers also highlight the ongoing tension between simulation and reality. The algorithms are developed and evaluated primarily in simulation, with real-world transfer treated as a final validation step. This is practical, simulation is cheaper and safer, but it means we're always one step removed from the actual problem we care about.
Several questions remain unclear from these papers:
For ContactExplorer: How does the approach scale to more complex objects with richer contact geometries? The paper mentions diverse manipulation tasks, but the abstract doesn't specify what those are. Does the learned hash code representation generalize across objects, or does it need to be re-learned for each new object class?
For MotionDisco: What are the failure modes of the LLM-guided search? When does the LLM give bad suggestions, and how does the system recover? How long does the discovery process take for a new task, and how much human specification is required to define the task?
For both: How do these approaches compose? Could you use ContactExplorer's contact coverage signal within MotionDisco's search framework? Would that help or hurt?
If I were reviewing follow-up work in this area, I'd want to see:
More detailed failure analysis for real-world transfer. Not just success rates, but characterization of failure modes. When does sim-to-real break down, and why?
Longer time horizons and more complex object interactions. Both papers focus on relatively contained tasks. What happens when you need to manipulate multiple objects over minutes rather than seconds?
Better understanding of the LLM's role in MotionDisco. Ablations are good, but mechanistic understanding would be better. What knowledge is the LLM contributing, and could we extract it into a more interpretable form?
Cross-paper comparison. ContactExplorer and MotionDisco solve related but different problems. A unified evaluation on shared benchmarks would help the field understand the relative strengths of each approach.
(I realize I'm asking for a lot here. These are hard problems, and both papers represent meaningful progress. But the gap between current capabilities and robust, general-purpose manipulation remains large.)
ContactExplorer and MotionDisco both tackle the fundamental challenge of teaching robots to interact physically with the world. ContactExplorer offers a principled approach to exploration in dexterous manipulation through contact coverage rewards. MotionDisco demonstrates that LLM-guided evolutionary search can discover complex humanoid motions without human demonstrations.
Neither paper solves dexterous manipulation. Both papers make it slightly less unsolved. In a field where progress is measured in increments, that counts for something.
The real test will come in the next few years, as other researchers try to build on these methods. Do the ideas generalize? Do they compose with other techniques? Do they transfer to robots and tasks beyond those tested? We don't know yet. But the questions being asked are the right ones, and the approaches are at least plausible. That's more than can be said for a lot of robotics research.
I'll be watching the project pages for updates on real-world results. The videos, as always, will tell us more than the abstracts.
Forget the flashy humanoid demos. The most impressive robotics work this week involves millimeter-precision eye surgery, and it's making me rethink what 'autonomy' actually means.