Boston

Chores Beware, Robots Nab New Skills with Crowdsourced Smarts from MIT and Pals

AI Assisted Icon
Published on November 27, 2023
Chores Beware, Robots Nab New Skills with Crowdsourced Smarts from MIT and PalsSource: Google Street View

Robots just got a step closer to doing your chores for you, and they won't need a PhD to figure it out. In a remarkable leap for the field of artificial intelligence, researchers have come up with an ingenious new method for training robots that crowdsources human help without requiring the helpers to be experts. It's called HuGE, short for Human-Guided Exploration, and it's about to change how we teach our metal friends new tricks.

Instead of relying on a meticulously crafted reward function designed by some egghead in a lab, this new system gathers feedback from you, me, and anyone who can click through a few photos. This approach, pioneered by a squad from MIT, Harvard University, and the University of Washington, could make training AI faster, easier, and a lot more scalable, according to MIT News.

The method is disarmingly simple: show a user two outcomes of a robot's behavior and let them pick which one is closer to the goal. This binary feedback might seem primitive, but the genius lies in its accumulation and application. "One of the most time-consuming and challenging parts of designing a robotic agent today is engineering the reward function," revealed Pulkit Agrawal, an assistant professor at MIT, "Our work proposes a way to scale robot learning by crowdsourcing the design of the reward function and by making it possible for nonexperts to provide useful feedback." Agrawal leads the Improbable AI Lab at the MIT CSAIL.

During the process, the rewards don't directly drive the robot's actions—think of them more like those breadcrumbs Hansel and Gretel dropped in the woods. The robots don't follow them slavishly; they just give them a nudge in the right direction. By separating the guiding cues from the actual exploration, the robots can keep learning and adapting, no matter how messy the human input gets. "The agent would take the reward function too seriously. It would try to match the reward function perfectly." Marcel Torne, a research assistant in Agrawal's lab, illustrates how previous models got stuck. "Instead of directly optimizing over the reward function, we just use it to tell the robot which areas it should be exploring," he mentioned to MIT News.

In both simulated environments and the real world, HuGE has made robots faster learners. Whether it's doodling the letter "U" or mastering a maze, this method has outpaced the old-school approaches to getting to the goal. Plus, for us regular humans, it's a breeze to lend a hand—with 109 users from 13 countries whipping through thirty snaps in under two minutes. Moreover, even when the feedback from these nonexperts is noisy, it's still more effective than the synthetic data put together by the researchers. "This makes it very promising in terms of being able to scale up this method," Torne added, with a nod toward the future implications of their work.

The system can keep the learning going, even resetting tasks for the robot when necessary, creating a continuous loop of trial, error, and improvement—autonomy at its finest.

Boston-Science, Tech & Medicine