Humans, and some other animals, devote much time and energy to exploring and obtaining information, and sometimes the search for information can be independent of a foreseeable profit, as if learning were reinforcing in and of itself. This is associated to our high degree of curiosity, our intrinsic desire to know and understand.
Such intrinsic motivation mechanisms are observed during the whole life, from infant’s spontaneous exploration of their body and external objects to adults reading novels or conducting research.
Together with my colleagues, we have been studying these mechanisms of curiosity-driven learning and information seeking within a systemic and multidisciplinary approach, modeling them using algorithmic and robotic tools in constant dialog with developmental psychology, neuroscience, and statistical theories of learning.
Further identifying the richness and variety of intrinsic motivation mechanisms, this led us to establish fundamental links between curiosity-driven learning and cognitive development. In particular, we have shown that such mechanisms can self-organize complex developmental structures, where stages of increasing behavioral and cognitive complexity spontaneously form. For example, we showed that an intrinsic drive pushing a robot to search situations where it experiences learning progress can spontaneously lead it to first explore and discover its own body, then external object affordances, and finally vocal and proto-linguistic interaction with others.
In such a vision, a limited set of meta-cognitive structures allow a learner to autonomously and actively select and order its own learning experiences, creating its own curriculum where skills, including the manipulation of external objects, get naturally sequenced towards increasing complexity.
In another related strand of research, detailed on this page, we have also used and extended these models to engineering purposes, and showed that such curiosity-drive active learning mechanisms have strong properties of robustness and efficiency for lifelong robot learning of sensorimotor skills, especially in the context of strategic and life-long learning.
A good starting point for reading about our work in this area are the three articles:
Information Seeking, Curiosity and Attention: Computational and Neural Mechanisms
Gottlieb, J., Oudeyer, P-Y., Lopes, M., Baranes, A. (2013)
Trends in Cognitive Science. http://dx.doi.org/10.1016/j.tics.2013.09.001 Bibtex
Intrinsic Motivation Systems for Autonomous Mental Development
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007)
IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286. Bibtex
What is intrinsic motivation? A typology of computational approaches
Oudeyer P-Y. and Kaplan F. (2007)
Frontiers in Neurorobotics, 1:6. Bibtex
Direct links towards more details:
Note: complementary information on this topic is also available on the FLOWERS INRIA team web site.
IAC and the Playground Experiment. In particular, the IAC architecture and its implications were studied in a series of experiments, called the Playground Experiments (Oudeyer and Kaplan, 2006; Oudeyer et al., 2007). Figure 1 illustrates the cognitive architecture employed by the IAC. Prediction learning plays a central role in the IAC architecture. In particular, there are two specific modules in the model that predict future states. First, the “Classic Machine learner” M is a machine that learns a forward model. The forward model receives as input the current sensory state, context, and action, and generates a prediction of the sensory consequences of the planned action. An error feedback signal is provided on the difference between predicted and observed consequences, and allows to update the forward model . Second, the “Meta Machine learner” metaM receives the same input as M, but instead of generating a prediction of the sensory consequences, metaM learns a meta-model that allows to predict how much the errors of the lower-level forward model will decrease in local regions of the sensorimotor space, i.e. modeling learning progress locally. In order to deal with the difficulties of generalization and high-dimensional continuous spaces, an associated categorization mechanism progressively splits the sensorimotor space in sub-regions, for example by maximizing their differences in predictability (Baranes and Oudeyer, 2009), and focusing its refinement of categorization in regions where learning progress is maximal. Then, in each observed context/state, an action selection system chooses stochastically which actions to experiment so as to maximize expected learning progress. Such a system allows the robot to automatically avoid experimenting actions which outcome is either trivial or too difficult to predict/learn at a given moment of development, while first focusing on simple actions and progressively shifting to more complex ones.
In order to evaluate the IAC architecture in a physical implementation, the Playground Experiments were developed (Oudeyer and Kaplan, 2006; Oudeyer et al., 2007). During the experiment, a quadruped robot is placed on an infant play mat and presented with a set of nearby objects, as well as an “adult” robot caretaker (see Figure 2). The robot is equipped with four kinds of motor primitives parameterized by several continuous numbers and which can be combined, thus forming an infinite set of possible actions: (a) turning the head in various directions; (b) opening and closing the mouth while crouching with various strengths and timing; (c) rocking the leg with various angles and speed; (d) vocalizing with various pitches and lengths. Similarly, several kinds of sensori primitives allow the robot to detect visual movement, salient visual properties, proprioceptive touch in the mouth, and pitch and length of perceived sounds. For the robot, these motor and sensori primitives are initially black boxes and he has no knowledge about their semantics, effects or relations. The IAC architecture is then used to drive the robot’s exploration and learning purely by curiosity, i.e. by the search of learning progress. The nearby objects include an elephant (which can be bitten or “grasped” by the mouth), a hanging toy (which can be “bashed” or pushed with the leg) and an adult robot “caretaker” pre-programmed to imitate the learning robot when the latter looks at the adult while vocalizing at the same time.
Open-ended and embodied acquisition of skills. A key finding from the Playground Experiments is the self-organization of structured developmental trajectories, where the robot explores objects and actions in a progressively more complex stage-like manner, while acquiring autonomously diverse affordances and skills that can be reused later on. As a result of a series of runs of such experiments, the following developmental sequence is typically observed:
New hypothesis for infant development. Two aspects of this outcome can be noted. First, it shows how an IM system can drive a robot to learn autonomously a variety of affordances and skills for which no engineer provided beforehand specific reward functions. Second, the observed process spontaneously generates three properties of infant development so far mostly unexplained:
Interation with other developmental constraints: imitation, maturation, cognitive abstractions and embodiment.
Intrinsic motivation systems can be conceptualized as one among many interacting mechanisms that help organisms (natural or artificial) to explore and learn efficiently in very large sensorimotor spaces. Such other mechanisms include social guidance (e.g. imitation learning), cognitive abstraction (e.g. unsupervised perceptual learning that creates internal concepts or goals out of raw sensorimotor values), embodiment and maturation (i.e. evolution of morphological properties of the body). The following article discusses the importance of integrating these mechanisms within an entire cognitive system:
Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints
Oudeyer P-Y., Baranes A., Kaplan F. (2013)
in Intrinsically Motivated Learning in Natural and Artificial Systems, eds. Baldassarre G. and Mirolli M., Springer. Bibtex
Keywords: developmental robotics, epigenetic robotics, intrinsic motivation, curiosity, values, development, intrinsically motivated reinforcement learning, autonomy, behaviour, developmental trajectory, complexity, active learning.
The Playground Experiment. We have built an experimental setup, called the Playground Experiment, which allowed to show how the curiosity algorithm which we developped allows for the self-organization of developmental trajectories with sequences of behavioural stages of increasing complexity (Oudeyer et al., 2007,Oudeyer and Kaplan, 2006).
Learning omnidirectional quadruped locomotion.In this experiment, we showed how the successive architectures we developped allow a quadruped robot, initially equipped with parameterized motor primitives in the form of a 24 dimensional oscillator (sinuses with various parameters in most of the joints), learns to use these motor primitives to locomote precisely in all directions and in varied manners. In the article (Baranes and Oudeyer, 2013), we study extensively a physical simulation of this experimental setup with active learning algorithms.
Information Seeking, Curiosity and Attention: Computational and Neural Mechanisms
Gottlieb, J., Oudeyer, P-Y., Lopes, M., Baranes, A. (2013)
Trends in Cognitive Science, 17(11), pp. 585-596. http://dx.doi.org/10.1016/j.tics.2013.09.001 Bibtex Pdf preprint
Self-organization of early vocal development in infants and machines: the role of intrinsic motivation
Moulin-Frier, C., Nguyen, S.M., Oudeyer, P-Y. (2014)
Frontiers in Psychology (Cognitive Science), 4(1006), Bibtex
Active Choice of Teachers, Learning Strategies and Goals for a Socially Guided Intrinsic Motivation Learner
Nguyen, M., Oudeyer, P-Y. (2013)
Paladyn Journal of Behavioural Robotics.
Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints
Oudeyer P-Y., Baranes A., Kaplan F. (2013)
in Intrinsically Motivated Learning in Natural and Artificial Systems, eds. Baldassarre G. and Mirolli M., Springer. Bibtex
Curiosity Driven Phonetic Learning
Moulin-Frier, C. and Oudeyer, P-Y. (2012)
in Proceedings of IEEE International Conference on Development and Learning and Epigenetic Robot (ICDL-Epirob), San Diego, USA. Bibtex (Best Paper Award, category “Models of Cognitive Development”).
Stable kernels and fluid body envelopes
Kaplan, F., Oudeyer, P-Y. (2009)
SICE Journal of Control, Measurement, and System Integration, 48(1).
Computational Models in the Debate over Language Learnability
Kaplan, F., Oudeyer, P-Y., Bergen B. (2008)
Infant and Child Development, 17(1), pp. 55–80. Bibtex
Le corps comme variable expérimentale
Kaplan, F., Oudeyer, P-Y. (2008)
Revue Philosophique de la France et de l’Etranger, pp. 287–298. Bibtex
Intrinsic Motivation Systems for Autonomous Mental Development
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007)
IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286. DOI: 10.1109/TEVC.2006.890271. Bibtex
The progress-drive hypothesis: an interpretation of early imitation
Kaplan, F. and Oudeyer, P-Y. (2007b)
In Dautenhahn, K. and Nehaniv, C., editor, Models and mechanisms of imitation and social learning: Behavioural, social and communication dimensions, pp.361–377, Cambridge University Press. 2005. Bibtex
What is intrinsic motivation? A typology of computational approaches
Oudeyer P-Y. and Kaplan F. (2007)
Frontiers in Neurorobotics, 1:6, doi: 10.3389/neuro.12.006.2007. Bibtex
In search of the neural circuits of intrinsic motivation
Kaplan F. and Oudeyer P-Y. (2007a)
Frontiers in Neuroscience, 1(1), pp.225–236. Bibtex
Discovering Communication
Oudeyer P-Y., Kaplan F. (2006)
Connection Science, 18(2), pp. 189–206. Bibtex