Teaching Robots to Learn on the Job Is Getting Serious, and the Results Are Harder to Ignore
Three new papers on offline-to-online reinforcement learning suggest robots are getting much better at picking up skills without starting from scratch every time.
By
A robot that can learn pipe assembly to 100% success rate in under two hours of real-world practice. That's not a press release number. That's what the Q2RL paper out of arXiv is claiming, and I'll be honest, when I first read it I assumed I'd misunderstood the benchmark.
I hadn't.
Look, here's the thing. When I was at Kuka, one of the persistent headaches with deploying any kind of adaptive control was the gap between what a robot learned in simulation or from pre-recorded data and what it actually needed to do on the factory floor. We called it the sim-to-real gap back then, though the problem is older than that phrase. You'd spend weeks curating training data, getting it clean and labelled and consistent, and then the robot would hit some edge case on the line that none of your demos had covered and you were back to square one. The engineers at the receiving end were not always patient about this. Understandably.
What's interesting about the current wave of offline-to-online reinforcement learning research is that it's directly attacking that problem, from a few different angles at once.
The Q2RL paper, published on arXiv, takes a fairly elegant approach. Instead of throwing away what a robot learned from behavior cloning (basically, watching demonstrations), it extracts something called a Q-function from that cloned policy and uses it to guide online reinforcement learning once the robot is actually deployed. The trick they call Q-Gating switches between the imitation-learned behavior and the RL-learned behavior depending on which one looks better in the moment. On contact-rich tasks like pipe assembly and kitting, they're reporting success rates up to 100% and improvements of up to 3.75 times over the original behavior cloning policy, in one to two hours of on-robot interaction. That's genuinely fast. I've seen integration projects at mid-sized automotive suppliers that took six months to get a gripper reliably picking the same part every time.
関連記事
More in Industrial
A pair of arxiv papers on robot planning caught my eye this week. One's about object-aware decision-making, the other about robots refining their own plans mid-thought. Both point in the same direction.
Robert "Bob" Macintosh · 12 hours ago · 4 min
Four new papers on robot manipulation landed this week, and honestly, a couple of them are the real deal.
Robert "Bob" Macintosh · 12 hours ago · 5 min
A new study finds that AI-driven robot systems trained in English fall apart when you give them instructions in any other language. For global factory floors, that's a real problem.
Robert "Bob" Macintosh · 13 hours ago · 4 min