Google AI researchers are calling this solution “off-policy classification,” or OPC, which evaluates the performances of AI-based tech by treating that “evaluation as a classification problem.”
Google’s AI solution will function off of reinforcement learning, which rewards software policies for achieving specified goals, along with image inputs and scales to tasks, such as vision-based robotic grasping. The tech will also be able to learn from older data, and then teach several other models based on the same collected dataset. From there, the tech will select the “best one” based on what machine learning model is needed, Robotics at Google software engineer Alex Irpan told Venture Beat.
Where OPC will be used:
Decision makers may see versions of OPC at the NeurlPS 2019 conference, especially during the MineRL competition. During this competition, players are cast within the Minecraft environment and are charged with the goal of training the agent via reinforcement learning to collect diamonds. The competition organizers will also provide players the MineRL dataset, which includes a dataset of human demonstrations.
Since the concept and application of OPC are still in progress and complicated, competitions like MineRL enable players and decision makers the opportunity to gain first-hand experience with it. Plus, players and decision makers are able to see human demonstration data: “It is difficult for individual researchers to collect large amount of demonstrations to test their ideas,” says EndtoendAI. “The competition alleviates this problem and allows researchers to implement their own algorithms without worrying about collecting data.”
Finally, having exposure to things like these will prepare researchers and decision makers for the future of OPC, especially as the solutions and applications become more complex, and are used more widely, Irpan told Venture Beat. ““[W]e think the results are promising enough to be applied to many real-world RL problems,” he said.