The Reward Engineering Problem Is Getting Fixed, Just Not the Way You Think

Four new papers from robotics researchers tackle one of RL's most stubborn bottlenecks, and the approaches are more varied and more interesting than the headlines suggest.

9 hours ago7 min read

Most of the coverage on LLMs in robotics right now focuses on manipulation demos and humanoid hype. What's getting less attention is a quieter but arguably more consequential problem that's been grinding away at researchers for years: reward engineering. Specifically, how do you tell a robot what "good" looks like without spending hundreds of engineering hours hand-crafting reward functions that break the moment conditions change?

Four papers published this month on arXiv take different angles on this problem, and together they sketch out something close to a coherent research direction. None of them are magic bullets. But the convergence is worth paying attention to.

What Do the Numbers Actually Say?

Let me be specific about what's actually being claimed across these four papers, because the abstracts can blur together if you're skimming.

The first paper, Self-CriTeach (arXiv:2509.21543), proposes a framework where an LLM essentially teaches itself how to plan robotic tasks. The mechanism is dual-purpose: the model generates symbolic planning domains, which then serve both as a source of training data (chain-of-thought trajectories for supervised fine-tuning) and as structured reward functions for reinforcement learning. The key claim is that this sidesteps the need for manual reward engineering while also reducing the cost of collecting chain-of-thought supervision, which historically has required human annotators or expensive oracle systems.

Related coverage

More in Research

A cluster of new robotics research tackles cloth manipulation, VLA latency, and humanoid locomotion. The results are genuinely interesting, though production-ready is still a ways off.

James Chen · 6 hours ago · 7 min

A pair of new arXiv preprints take different but complementary approaches to a problem the field has largely been avoiding: how do you formally guarantee the safety of a robot running a foundation model?

Aisha Patel · 8 hours ago · 9 min

A pair of arXiv preprints tackle one of soft robotics' most stubborn problems: making tendon-driven continuum robots actually track where you tell them to go.

Aisha Patel · 10 hours ago · 8 min

What Do the Numbers Actually Say?

Sources