In a few week in Osaka, Japan, we will present our latest work on ways to enable robots to autonomously choose how they learn, in an article under the title of Autonomous Reuse of Motor Exploration Trajectories.
We decided to explore the idea of enabling a robot to modify its own learning method based on its previous experience. Human do the same; when studying, for instance when learning by heart a piece of knowledge, they explore different strategies: reading multiple times, rewriting, enunciating, visualizing, and repeating the learning sessions, or learning only once, the night before the test. Each individual eventually choose its preferred strategy, and tweak the specifics of it has it is reused, often based on its perceived effectiveness. Humans learn how to learn.
In our work, we focused on how a robot could improve the way it explore a new, unknown task. The exploration strategy is a important factor the learning effectiveness; work on intrinsic motivation by our team demonstrated that. And autonomous robots, while potentially subjected to very diverse situations, retain a constant morphology: their kinematics and dynamics remain stable, and their motor space, the set of possible motor commands, stays the same. The hypothesis we made was that some exploration strategies are a priori more effective for a given robot, and that those strategies can be uncovered by analyzing past learning experience, that is, past exploration trajectories. Using those exploration strategies would lead to increase in learning performance, compared to random ones.
Our experiment confirmed that. We identified, through autonomous empirical measurement, the motor commands of a first task that belonged to areas where learning had been the most effective, and then reused them on a similar, different tasks, where the robot didn’t have access the the learning experience of the first task. The early learning performance increased significantly. Our methods only requires that the motor space stays the same, the sensory space can be arbitrarily different (hence making possible to reuse an exploration strategy to learn in another modality), and doesn’t make assumptions about the learning algorithms used, which can even differ between tasks.
The article is available here here.
We released the code used to run the experiments, so that anyone can reproduce and analyze them. You can access it here
Reference: Fabien Benureau, Pierre-Yves Oudeyer, “Autonomous Reuse of Motor Exploration Trajectories“, in the proceedings of ICDL-Epirob 2013, Osaka, Japan.