Two New Papers Tackle Robot Generalisation From Opposite Ends. Both Are Worth Paying Attention To.
RAM and MiDiGap approach the problem of making robots work across different bodies and tasks in genuinely distinct ways. One is infrastructure; the other is policy learning. Together they sketch something interesting.
By
·5 hours ago·9 min de lecture
Think of a robot arm the way you might think of a car's turning radius. Before you can plan a route, you need to know what the vehicle can physically do. Every motion planner, every morphology designer, every trajectory optimiser working with robotic manipulators faces the same upstream problem: what poses can this robot actually reach, accounting for its geometry and the fact that its own links will sometimes get in the way of themselves? This is the reachability workspace problem, and it has been solved, unsolved, and re-solved dozens of times in the literature. Two papers published recently on arXiv suggest we may be entering a more mature phase of that conversation, though from very different angles.
The first, arXiv preprint 2606.09108, introduces RAM (Reachability Across Morphologies). The second, preprint 2505.03296, presents MiDiGap (Mixture of Discrete-time Gaussian Processes). They are not companion papers and do not cite each other. But read together, they address two adjacent and genuinely difficult problems: how do you characterise what a robot body can do, and how do you learn a policy that transfers across different robot bodies? The distinction matters.
Reachability workspace approximation is not a new problem. Methods based on Monte Carlo sampling, voxel grids, and capsule-based geometric approximations have existed for years. The issue is that these approaches are either slow, imprecise, or locked to a single robot morphology. If you change the arm, you recompute from scratch. For anyone working on morphology optimisation, where the robot's physical design is itself a variable being tuned, this is a serious bottleneck.
À lire aussi
More in Research
A calibration fix for the da Vinci's notorious encoder drift, and the first autonomous clip placement on a phantom. Both are real progress. Neither is ready for the OR.
Aisha Patel · 12 hours ago · 7 min
New research uses reinforcement learning in a shared mathematical space to let soft robots adapt across wildly different body configurations without starting from scratch.
Sarah Williams · Yesterday · 6 min
Cross-view fusion and energy-based models offer different solutions to occlusion, but both papers reveal how far we still are from solved grasping.
Aisha Patel · 2 days ago · 9 min
ROP-RAS3 and VOPP represent genuine algorithmic progress for partially observable planning, though the robotics community should temper its excitement until we see more diverse benchmarks.
RAM's contribution is to treat reachability as a morphology-conditioned implicit neural representation. To be precise, it learns a function that takes a robot morphology description and a query pose as input, and outputs a probability that the pose is reachable, accounting for self-collisions. The model is trained on a dataset of 30 billion samples (3 times 10 to the power of 10, as the abstract states) generated from forward kinematics alone. No inverse kinematics solver required during training. That is a meaningful design choice: forward kinematics is cheap and parallelisable; IK is not.
The performance numbers are striking. The model achieves an F1-score of 86% at nanosecond inference speed, which the authors report as a 14 percentage point improvement over the baseline while reducing inference time by three orders of magnitude. Three orders of magnitude is not a rounding error. That is the difference between a computation that takes milliseconds and one that takes nanoseconds, which matters enormously when you are running gradient-based optimisation in a loop.
The authors further demonstrate speed-ups of one order of magnitude for morphology optimisation and two orders of magnitude for trajectory optimisation. It is worth noting that these are the kinds of speed-ups that do not just make existing pipelines faster; they make previously impractical workflows feasible. Differentiable reachability is particularly interesting here. If your workspace surrogate is differentiable, you can backpropagate through it during morphology search, which opens up gradient-based co-design in a way that sampling-based methods simply cannot support.
Is this genuinely new? Partially. Implicit neural representations for robot geometry have appeared before, including work on neural collision detection and learned signed distance functions for robot links. The novelty in RAM is the combination: morphology-conditioned, self-collision-aware, differentiable, and trained at a scale that supports generalisation to unseen robot designs. The cross-morphology generalisation claim is the part I would most want to see stress-tested. The paper reports results on unseen morphologies, which is the right evaluation, but it remains unclear how far outside the training distribution the model can generalise before F1 degrades meaningfully. That question is not fully answered by the current experiments.
(The dataset itself, 30 billion forward kinematics samples, is a contribution in its own right and the authors are publishing it. That kind of infrastructure work is undervalued in the research community and I want to flag it explicitly.)
MiDiGap is solving a different problem, though one with obvious connections. Where RAM is about characterising what a robot body can do, MiDiGap is about learning what a robot should do, and doing so efficiently enough to transfer across embodiments.
The approach is built on Gaussian process mixtures, specifically a mixture of discrete-time Gaussian processes used as a policy representation for imitation learning. The framing here is worth unpacking. Most recent work on robot manipulation policy learning has converged on diffusion-based approaches (Diffusion Policy, Chi et al., 2023 being the most cited) or transformer-based architectures trained on large datasets. MiDiGap is, in a sense, a deliberate step in the opposite direction: a structured probabilistic model that learns from as few as five demonstrations.
Five demonstrations. That number deserves scrutiny. The benchmark results are compelling: on constrained RLBench tasks, MiDiGap improves policy success by 76 percentage points over prior methods and reduces trajectory cost by 67%. On multimodal tasks (where the correct action is genuinely ambiguous), it improves success by 48 percentage points and increases sample efficiency by a factor of 20. In cross-embodiment transfer, it more than doubles policy success compared to baselines.
Actually, the research shows something more specific than just few-shot performance. The Gaussian process mixture representation is doing real probabilistic work here. Multimodal task distributions, where there are multiple valid ways to complete a task, are notoriously difficult for deterministic policy representations. A GP mixture naturally captures multimodality in the trajectory space, which is the right inductive bias for tasks like hanging a mug where the robot might approach from several valid angles.
The inference-time steering capability is also notable. The authors develop tools for conditioning the policy at inference time using collision signals and kinematic constraints. This is not just post-hoc filtering; it is principled Bayesian conditioning. If the robot encounters an obstacle not present during training, the GP framework allows the trajectory distribution to be updated accordingly. This is the kind of thing that is genuinely hard to do with diffusion-based policies without retraining or expensive sampling.
I know I am being picky here, but the cross-embodiment transfer results need more context than the abstract provides. Doubling policy success is impressive, but the baseline success rates matter enormously for interpreting that number. If the baseline is 15% success and MiDiGap achieves 35%, that is meaningful but not yet deployment-ready. The paper is available and the full experimental details are there, but readers skimming abstracts should be cautious about the magnitude of the improvement without knowing the absolute numbers.
The computational story is also worth noting. MiDiGap trains on a CPU in under a minute for small datasets. That is a significant practical advantage over diffusion-based approaches, which typically require GPU training over hours or days. Whether this advantage holds as dataset size scales is a separate question; the authors report linear scaling, which is the right property, but the constant factors matter.
RAM and MiDiGap are not solving the same problem. But they are both, in different ways, attacking the brittleness of current robot learning and planning pipelines to variation in robot morphology.
The standard assumption in most manipulation research is that you have a fixed robot, fixed sensors, and a fixed task distribution. You train, you evaluate, you publish. The problem is that real deployment involves variation in all three dimensions. Robots get upgraded. Arms are replaced. Tasks change. The research community has known this for years and has made incremental progress on each axis separately.
What is interesting about the current moment is that the tools for handling morphology variation are maturing on multiple fronts simultaneously. RAM provides a fast, differentiable characterisation of what a body can do. MiDiGap provides a policy representation that can be steered at inference time to respect kinematic constraints and transfer across bodies. These are complementary capabilities, and it is not hard to imagine a pipeline that uses something like RAM for morphology-aware planning and something like MiDiGap for policy execution.
It is too early to say whether either of these approaches will hold up at scale or in real deployment settings. The RAM paper's evaluation is on a dataset the authors themselves generated, which is the right approach for controlled evaluation but does not tell us how the model performs on robots with unusual link geometries or non-standard joint configurations. MiDiGap's cross-embodiment results are promising but, as noted, the absolute success rates and the specific embodiment pairs tested matter for drawing conclusions.
There is also a broader methodological question that neither paper fully addresses. Both approaches rely on the assumption that the relevant variation in morphology or task can be captured by the training distribution. For RAM, this means the training set of robot designs needs to cover the space of designs you might encounter. For MiDiGap, the five-demonstration regime works when the task is well-specified and the demonstrations are informative. In open-ended or adversarial settings, these assumptions may not hold. This is based on reading the abstracts and available preprint material; a full replication study would be needed to make stronger claims.
For RAM: out-of-distribution evaluation on robot morphologies that are genuinely unlike anything in the training set. Soft robots, parallel mechanisms, robots with more than six degrees of freedom in unusual configurations. The 86% F1 score is good, but the variance across morphology types is the number I want.
For MiDiGap: a direct comparison with diffusion-based policies at matched dataset sizes, not just in the few-shot regime. The few-shot results are the paper's strongest claim, but understanding where the GP mixture approach degrades relative to diffusion models would clarify exactly when you would choose one over the other.
For both: independent replication. Neither paper has been replicated by outside groups as of this writing, which is not a criticism, it is just the normal state of preprint research. The code for MiDiGap is publicly available at the project page, which is the right move. RAM's website is listed in the abstract. Replication attempts will tell us a great deal more than the original evaluations can.
The deeper question these papers raise is about the architecture of the next generation of robot learning systems. If reachability can be queried in nanoseconds and policies can transfer across embodiments with a handful of demonstrations, the bottleneck shifts. It shifts toward task specification, toward the quality of demonstrations, toward the reliability of perception. Those are hard problems too, but they are different hard problems. And that, in a way, is what progress in robotics research looks like: not solving everything at once, but moving the constraint.