Planetary Rovers Are Learning to Walk on Sand, and the Results Are Surprisingly Practical
New research from NASA JPL and university labs shows reinforcement learning can teach rovers to handle loose soil without getting stuck, cutting energy use by 37% on sandy slopes.
By
·4 hours ago·6 min read
The rover sits at the base of a 20-degree sandy slope, its wheels half-buried in the kind of loose, dry soil that has stranded countless machines before it. A traditional suspension system would spin uselessly here, digging deeper with every rotation. But this one, a four-wheeled prototype called ERNEST, shifts its weight forward, reconfigures its wheel angles, and climbs. No human is driving. A neural network trained entirely in simulation is making every decision.
I've seen enough spec sheets to know when a lab demo is just that, a demo. But the research coming out of planetary robotics labs this month suggests something more substantial: reinforcement learning is finally solving the granular terrain problem that has plagued off-world rovers for decades.
Anyone who's driven on a beach knows the feeling. Wheels sink. Traction disappears. The harder you push, the worse it gets.
For planetary rovers, this isn't an inconvenience. It's mission-critical. The lunar surface is covered in regolith, a fine, powdery material that behaves nothing like the rigid ground most robots are designed for. Mars has similar challenges. And the physics involved, something called Bekker-Wong terramechanics, is notoriously difficult to model accurately.
Traditional rover designs address this with wide wheels, low speeds, and conservative path planning. The Curiosity rover on Mars, for instance, moves at a maximum speed of about 0.14 km/h. That's not a typo. It's slower than a garden snail.
The new research takes a different approach: instead of avoiding difficult terrain, teach the rover to adapt to it.
Related coverage
More in Autonomy
A batch of new papers suggests the field is moving past toy problems, but I've seen this movie before.
Robert "Bob" Macintosh · 7 hours ago · 3 min
I've been burned by EV hype before, but Ford's Skunkworks project is doing something nobody else seems willing to try: making a small, cheap truck.
Mark Kowalski · 9 hours ago · 6 min
Two new papers tackle the geometry problem that's kept cheap, wide-angle cameras from reaching their potential in autonomous systems.
James Chen · 10 hours ago · 5 min
IR-SIM and HA-VLN 2.0 take different approaches to the same challenge, and both reveal how far we still have to go.
The ERNEST paper, published this week, describes a rover with what the researchers call an Active Gimbal Suspension. Each wheel can adjust its yaw and roll independently, allowing the machine to redistribute its weight on the fly.
Here's where it gets interesting. A single neural network controls all four wheels simultaneously, making real-time decisions based on stereo camera data, chassis attitude, joint positions, and force-torque measurements. The controller doesn't switch between different modes for different terrain types. It just, well, figures it out.
The numbers are worth noting:
37% reduction in cost of transport on dry sand (compared to passive suspension)
Zero-shot transfer from simulation to physical hardware
Successful traversal of rock fields, sand ripples, and slopes up to 20 degrees
On wet sand, where a passive suspension became "completely immobilized" (their words), the learned controller kept moving. That's a meaningful result.
Look, I'm usually skeptical when papers claim zero-shot sim-to-real transfer. From my time building hardware, I know how many things can go wrong when you move from a physics engine to actual motors and sensors. But the researchers used domain randomization and sensor noise injection during training, which basically means they deliberately made the simulation worse to prepare for real-world messiness. It's a technique that's been proven in other domains, and apparently it works here too.
A separate study on lunar quadruped locomotion tackles a related but distinct challenge: what happens when you train a walking robot assuming rigid ground, then deploy it on regolith?
The short answer: it doesn't go well.
The researchers compared policies trained on hard surfaces versus soft, granular contacts. The soft-contact trained robots developed qualitatively different gaits, slower and more deliberate, with higher energy expenditure. That last part is important. Energy is at a premium on the Moon. There's no gas station, and solar panels only work half the time.
What remains unclear is how much of this energy penalty is fundamental physics versus something that better algorithms could reduce. The paper doesn't fully answer that question, and I suspect it's because nobody knows yet. This is early-stage work.
The third piece of research I came across this week takes a completely different angle. Instead of improving the terrain model, a team working on neuromorphic reinforcement learning is trying to make the learning process itself more efficient.
Their approach replaces standard backpropagation with something called equilibrium propagation. The technical details are dense, but the practical implication is straightforward: 4.3x improvement in GPU memory efficiency during training, with comparable locomotion performance to conventional methods.
Why does this matter for planetary rovers? Because if you can train on the robot itself, rather than in a massive simulation cluster, you can adapt to conditions you didn't anticipate. A rover that lands on unexpectedly soft soil could, in theory, retrain its own gait overnight.
That's an ambitious claim, and the paper only demonstrates this on an A1 quadruped in simulation, not on actual planetary hardware. The real test is whether this scales to the kind of low-power processors that can survive space radiation and extreme temperatures.
NASA's Artemis program aims to return humans to the Moon by 2027 (though that timeline has, let's say, flexibility). The agency has been explicit about wanting robotic systems that can operate autonomously in advance of human crews, scouting terrain, delivering supplies, and constructing infrastructure.
Current lunar rovers are conservative by design. They stick to flat, well-mapped areas. They move slowly. They avoid anything that looks risky.
If the research I've described here pans out, future rovers could be more like off-road vehicles than golf carts. They could climb slopes, cross loose soil, and navigate obstacles without constant human oversight. The 37% energy savings on sandy terrain isn't just a nice number; it translates directly to longer mission duration or reduced solar panel size.
But I should note the limitations here. All of this work is based on relatively small-scale prototypes. The ERNEST rover is a lab demonstrator, not a flight-qualified system. The lunar quadruped exists only in simulation. And neuromorphic hardware that can actually run these algorithms in space doesn't exist yet, or at least hasn't been publicly demonstrated.
The gap between "works in the lab" and "works on the Moon" is substantial. Radiation, thermal cycling, communication delays, and the sheer difficulty of testing in realistic conditions all present challenges that these papers don't address.
Still, the trajectory is clear. Reinforcement learning is becoming the default approach for complex locomotion problems. The question isn't whether future planetary rovers will use learned controllers, it's whether the specific techniques demonstrated here will scale to operational systems.
From my time in hardware engineering, I learned to be skeptical of claims that skip over the messy details of integration, manufacturing, and reliability. These papers are research, not product announcements. But they're also some of the most practical work I've seen in planetary robotics in years.
The next step, I suspect, is a lunar demonstration mission with a learned locomotion controller. Several companies and agencies are working on exactly that. We should know within the next three to five years whether this approach actually works where it matters.