Two New Frameworks Tackle Reinforcement Learning's Reward Function Problem from Opposite Directions

One uses graph-based reasoning to auto-generate rewards; the other fuses human language and physical corrections. Both beat expert-designed baselines.

10 June 20265 min read

The hardest part of teaching a robot isn't the teaching. It's figuring out what to reward.

That's the core problem in reinforcement learning: you need a reward function that tells the system what "good" looks like, and designing one by hand is tedious, error-prone, and often requires domain expertise that most end users don't have. Two new papers from arXiv tackle this from completely different angles, and both show results that, if they hold up in production, could meaningfully reduce the human effort required to train robotic systems.

The first approach, RE-GoT, automates reward design entirely. Researchers have introduced Reward Evolution with Graph-of-Thoughts (RE-GoT), a framework that uses large language models combined with visual language models to generate and iteratively refine reward functions without human feedback. The key innovation here is structured reasoning: instead of asking an LLM to hallucinate a reward function in one shot, RE-GoT decomposes tasks into text-attributed graphs that break down the problem into analyzable components.

The numbers are worth paying attention to. On RoboGen benchmarks (10 tasks), RE-GoT improved average success rates by 32.25% over existing LLM-based methods. On ManiSkill2 manipulation tasks, it hit 93.73% average success across four tasks. That last figure is notable because it exceeds expert-designed rewards on those benchmarks.

Look, I've seen enough spec sheets to know that benchmark performance doesn't always translate to real-world deployment. But 93.73% on manipulation tasks is a strong result, and the fact that it beat hand-crafted rewards suggests the automated approach isn't just "good enough" but potentially better at capturing task requirements that human designers miss.

Related coverage

More in AI Models

Chipmakers swung wildly this week, from a Tuesday 'chip-wreck' to a Micron-led surge after hours. What's actually going on with AI's hardware backbone?

Sarah Williams · 26 Jun · 5 min

The original Creator Studio was shut down in 2023. Now it's back, rebuilt around an AI assistant that promises to grow your audience and reply to comments in your voice.

Sarah Williams · 26 Jun · 5 min

At its annual Config conference, Figma announced coding layers, AI-generated motion graphics, and a reimagined canvas that blurs the line between design and full-stack development.

Sarah Williams · 26 Jun · 5 min

Everyone talks about chips and models. The memory bottleneck is the part of the AI buildout that keeps getting underestimated, and Micron's latest earnings make that case hard to ignore.

Sources