Curiosity, intrinsic motivation and information seeking in cognitive development

Developmental Robotics Humans, and some other animals, devote much time and energy to exploring and obtaining information, and sometimes the search for information can be independent of a foreseeable profit, as if learning were reinforcing in and of itself. This is associated to our high degree of curiosity, our intrinsic desire to know and understand.

Such intrinsic motivation mechanisms are observed during the whole life, from infant’s spontaneous exploration of their body and external objects to adults reading novels or conducting research.

Together with my colleagues, we have been studying these mechanisms of curiosity-driven learning and information seeking within a systemic and multidisciplinary approach, modeling them using algorithmic and robotic tools in constant dialog with developmental psychology, neuroscience, and statistical theories of learning.

Further identifying the richness and variety of intrinsic motivation mechanisms, this led us to establish fundamental links between curiosity-driven learning and cognitive development. In particular, we have shown that such mechanisms can self-organize complex developmental structures, where stages of increasing behavioral and cognitive complexity spontaneously form. For example, we showed that an intrinsic drive pushing a robot to search situations where it experiences learning progress can spontaneously lead it to first explore and discover its own body, then external object affordances, and finally vocal and proto-linguistic interaction with others.

In such a vision, a limited set of meta-cognitive structures allow a learner to autonomously and actively select and order its own learning experiences, creating its own curriculum where skills, including the manipulation of external objects, get naturally sequenced towards increasing complexity.

In another related strand of research, detailed on this page, we have also used and extended these models to engineering purposes, and showed that such curiosity-drive active learning mechanisms have strong properties of robustness and efficiency for lifelong robot learning of sensorimotor skills, especially in the context of strategic and life-long learning.

A good starting point for reading about our work in this area are the three articles:

Information Seeking, Curiosity and Attention: Computational and Neural Mechanisms
Gottlieb, J., Oudeyer, P-Y., Lopes, M., Baranes, A. (2013)
Trends in Cognitive Science. http://dx.doi.org/10.1016/j.tics.2013.09.001 Bibtex

Intrinsic Motivation Systems for Autonomous Mental Development
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007)
IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286. Bibtex

What is intrinsic motivation? A typology of computational approaches
Oudeyer P-Y. and Kaplan F. (2007)
Frontiers in Neurorobotics, 1:6. Bibtex

Direct links towards more details:

Intrinsic motivation, curiosity and self-organization of developmental structures in robots and humans
- IAC and the Playground Experiment: From discovering body affordances to communication
  - Open-ended, life-long and embodied acquisition of skills
- Curiosity-driven vocal development: From discovering phonation to canonical babbling (under construction)
- New hypothesis for understanding the role of intrinsic motivation in infant development
  - Self-Organized and Active Staged development
  - The regularities/diversity duality in developmental structures
  - The origins of the self/object/other distinction.
  - The origins of imitation (under construction)
  - The role of intrinsic motivation in vocal development (under construction)
  - Early Development of Communication and Language
- Interaction with other developmental constraints: imitation, maturation, cognitive abstraction, embodiment
Selected video talks
Selected experiments videos
Selected publications

Note: complementary information on this topic is also available on the FLOWERS INRIA team web site.

IAC and the Playground Experiment. In particular, the IAC architecture and its implications were studied in a series of experiments, called the Playground Experiments (Oudeyer and Kaplan, 2006; Oudeyer et al., 2007). Figure 1 illustrates the cognitive architecture employed by the IAC. Prediction learning plays a central role in the IAC architecture. In particular, there are two specific modules in the model that predict future states. First, the “Classic Machine learner” M is a machine that learns a forward model. The forward model receives as input the current sensory state, context, and action, and generates a prediction of the sensory consequences of the planned action. An error feedback signal is provided on the difference between predicted and observed consequences, and allows to update the forward model . Second, the “Meta Machine learner” metaM receives the same input as M, but instead of generating a prediction of the sensory consequences, metaM learns a meta-model that allows to predict how much the errors of the lower-level forward model will decrease in local regions of the sensorimotor space, i.e. modeling learning progress locally. In order to deal with the difficulties of generalization and high-dimensional continuous spaces, an associated categorization mechanism progressively splits the sensorimotor space in sub-regions, for example by maximizing their differences in predictability (Baranes and Oudeyer, 2009), and focusing its refinement of categorization in regions where learning progress is maximal. Then, in each observed context/state, an action selection system chooses stochastically which actions to experiment so as to maximize expected learning progress. Such a system allows the robot to automatically avoid experimenting actions which outcome is either trivial or too difficult to predict/learn at a given moment of development, while first focusing on simple actions and progressively shifting to more complex ones.

The IAC algorithmic architecture for curiosity-driven learning and intrinsically motivated exploration

In order to evaluate the IAC architecture in a physical implementation, the Playground Experiments were developed (Oudeyer and Kaplan, 2006; Oudeyer et al., 2007). During the experiment, a quadruped robot is placed on an infant play mat and presented with a set of nearby objects, as well as an “adult” robot caretaker (see Figure 2). The robot is equipped with four kinds of motor primitives parameterized by several continuous numbers and which can be combined, thus forming an infinite set of possible actions: (a) turning the head in various directions; (b) opening and closing the mouth while crouching with various strengths and timing; (c) rocking the leg with various angles and speed; (d) vocalizing with various pitches and lengths. Similarly, several kinds of sensori primitives allow the robot to detect visual movement, salient visual properties, proprioceptive touch in the mouth, and pitch and length of perceived sounds. For the robot, these motor and sensori primitives are initially black boxes and he has no knowledge about their semantics, effects or relations. The IAC architecture is then used to drive the robot’s exploration and learning purely by curiosity, i.e. by the search of learning progress. The nearby objects include an elephant (which can be bitten or “grasped” by the mouth), a hanging toy (which can be “bashed” or pushed with the leg) and an adult robot “caretaker” pre-programmed to imitate the learning robot when the latter looks at the adult while vocalizing at the same time.

The Playground Experiment: a quadruped robot explores and learn physical and social affordances through curiosity-driven learning.

Open-ended and embodied acquisition of skills. A key finding from the Playground Experiments is the self-organization of structured developmental trajectories, where the robot explores objects and actions in a progressively more complex stage-like manner, while acquiring autonomously diverse affordances and skills that can be reused later on. As a result of a series of runs of such experiments, the following developmental sequence is typically observed:

In a first phase, the robot achieves unorganized body babbling;
In a second phase, after learning a first rough model and meta-model, the robot stops combining motor primitives, exploring them one by one, but each primitive is explore itself in a random manner;
In a third phase, the robot now begins to experiment actions towards zones of its environment where the external observer knows there are objects (the robot is not provided with a representation of the concept of “object”), but in a non-affordant manner (e.g. it vocalizes at the non-responding elephant or bashes the adult robot which is too far to be touched);
In a third phase, the robot now explores affordant experiments: he first focuses on grasping movements with the elephant, then shifts to bashing movements with the hanging toy, and finally shifts to exploring vocalizing towards the imitating adult robot.
In the end, the robot has learnt sensorimotor affordances with several objects, as well as social affordances with a peer, and masters multiple skills, yet none of these specific objectives where pre-programmed in the beginning. They self-organize through the dynamic interaction between intrinsic motivation, statistical inference, the properties of the body, and the properties of the environment.

New hypothesis for infant development. Two aspects of this outcome can be noted. First, it shows how an IM system can drive a robot to learn autonomously a variety of affordances and skills for which no engineer provided beforehand specific reward functions. Second, the observed process spontaneously generates three properties of infant development so far mostly unexplained:

Self-Organized and Active Staged development: Qualitatively different and more complex behaviours and capabilities appear along with time, and in a non-linear manner. Such unfolding is highly described in developmental psychology, but little principled explanation currently exists. The Playground Experiment provides the intriguing hypotheses that IM driven exploration, in dynamic interaction with the body and environment, could explain important aspects of how this unfolding can be made spontaneously (thus for example without an internal pre-programmed schedule that specifies to the organism what to do and when to do it). In particular, it suggests that developmental stages could be attractors of the dynamical system formed by the interaction between learning, curiosity, the body and the environment, and since this dynamical system continuously seeks for learning progress, and thus changes itself, the attractors are themselves changing, leading to novel developmental states.
The regularities/diversity duality in developmental structures: The typical developmental trajectory described above is only the most frequent emerging trajectory. No two trajectories are exactly the same (e.g. the order of action exploration in the fifth phase might change). And in some experiments, with the same robot, same mechanism, same environment, widely different trajectories can happen. The whole IM/body/environment system can be seen as a dynamical system with various attractors, and stochasticity can sometimes drive it in local minima far from the main attractor(s) (Thelen and Smith, 1993). Thus, this also suggests a novel principled IM-based mechanism to explain the duality regularities/diversity widely observed in infant development;
The origins of the self/object/other distinction.The categorization system associated in such an IM architecture generates also a progressive internal development of cognitive categories which complement the above described behavioral and skill development. As explained in (Kaplan and Oudeyer, 2007b, Oudeyer et al., 2007), such a mechanism can indeed allow the learning agent to progressively form fundamental categorical distinctions between “self”/”physical objects”/”others”, which are central in infant development.
The origins of imitation:
Early Development of Communication and Language: Through the same general mechanism, the robot both explores and learns how to manipulate objects and how to vocalize to trigger specific responses from a conspecific. While vocal babbling (Oller, 2000), and more generally language play and games, have been shown to be key in infant language development, an associated ad hoc motivation if typically assumed both in developmental psychology and computational models. The Playground Experiment suggests that the exploration and learning of communicative behavior might be at least partially explained by general intrinsically motivated exploration of the body affordances (Oudeyer and Kaplan, 2006). A more detailed study showed that curiosity-driven exploration of vocalizations can allow to reproduce aspects of developmental change in vocal babbling observed in human infants (Moulin-Frier et al., 2014). Further analysis of the links between IM, sensorimotor, social and language development can be found in (Kaplan et al., 2008).

Interation with other developmental constraints: imitation, maturation, cognitive abstractions and embodiment.

Intrinsic motivation systems can be conceptualized as one among many interacting mechanisms that help organisms (natural or artificial) to explore and learn efficiently in very large sensorimotor spaces. Such other mechanisms include social guidance (e.g. imitation learning), cognitive abstraction (e.g. unsupervised perceptual learning that creates internal concepts or goals out of raw sensorimotor values), embodiment and maturation (i.e. evolution of morphological properties of the body). The following article discusses the importance of integrating these mechanisms within an entire cognitive system:

Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints
Oudeyer P-Y., Baranes A., Kaplan F. (2013)
in Intrinsically Motivated Learning in Natural and Artificial Systems, eds. Baldassarre G. and Mirolli M., Springer. Bibtex

Keywords: developmental robotics, epigenetic robotics, intrinsic motivation, curiosity, values, development, intrinsically motivated reinforcement learning, autonomy, behaviour, developmental trajectory, complexity, active learning.

Selected video talks

Selected experiments videos

The Playground Experiment. We have built an experimental setup, called the Playground Experiment, which allowed to show how the curiosity algorithm which we developped allows for the self-organization of developmental trajectories with sequences of behavioural stages of increasing complexity (Oudeyer et al., 2007,Oudeyer and Kaplan, 2006).

Learning omnidirectional quadruped locomotion.In this experiment, we showed how the successive architectures we developped allow a quadruped robot, initially equipped with parameterized motor primitives in the form of a 24 dimensional oscillator (sinuses with various parameters in most of the joints), learns to use these motor primitives to locomote precisely in all directions and in varied manners. In the article (Baranes and Oudeyer, 2013), we study extensively a physical simulation of this experimental setup with active learning algorithms.

Selected publications:

Information Seekin g, Curiosity and Attention: Computational and Neural Mechanisms
Gottlieb, J., Oudeyer, P-Y., Lopes, M., Baranes, A. (2013)
Trends in Cognitive Science, 17(11), pp. 585-596. http://dx.doi.org/10.1016/j.tics.2013.09.001 Bibtex Pdf preprint

Self-organization of early vocal development in infants and machines: the role of intrinsic motivation
Moulin-Frier, C., Nguyen, S.M., Oudeyer, P-Y. (2014)
Frontiers in Psychology (Cognitive Science), 4(1006), Bibtex

Active Choice of Teachers, Learning Strategies and Goals for a Socially Guided Intrinsic Motivation Learner
Nguyen, M., Oudeyer, P-Y. (2013)
Paladyn Journal of Behavioural Robotics.

Curiosity Driven Phonetic Learning
Moulin-Frier, C. and Oudeyer, P-Y. (2012)
in Proceedings of IEEE International Conference on Development and Learning and Epigenetic Robot (ICDL-Epirob), San Diego, USA. Bibtex (Best Paper Award, category “Models of Cognitive Development”).

Stable kernels and fluid body envelopes
Kaplan, F., Oudeyer, P-Y. (2009)
SICE Journal of Control, Measurement, and System Integration, 48(1).

Computational Models in the Debate over Language Learnability
Kaplan, F., Oudeyer, P-Y., Bergen B. (2008)
Infant and Child Development, 17(1), pp. 55–80. Bibtex

Le corps comme variable expérimentale
Kaplan, F., Oudeyer, P-Y. (2008)
Revue Philosophique de la France et de l’Etranger, pp. 287–298. Bibtex

Intrinsic Motivation Systems for Autonomous Mental Development
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007)
IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286. DOI: 10.1109/TEVC.2006.890271. Bibtex

The progress-drive hypothesis: an interpretation of early imitation
Kaplan, F. and Oudeyer, P-Y. (2007b)
In Dautenhahn, K. and Nehaniv, C., editor, Models and mechanisms of imitation and social learning: Behavioural, social and communication dimensions, pp.361–377, Cambridge University Press. 2005. Bibtex

What is intrinsic motivation? A typology of computational approaches
Oudeyer P-Y. and Kaplan F. (2007)
Frontiers in Neurorobotics, 1:6, doi: 10.3389/neuro.12.006.2007. Bibtex

In search of the neural circuits of intrinsic motivation
Kaplan F. and Oudeyer P-Y. (2007a)
Frontiers in Neuroscience, 1(1), pp.225–236. Bibtex

Discovering Communication
Oudeyer P-Y., Kaplan F. (2006)
Connection Science, 18(2), pp. 189–206. Bibtex

Flowers Laboratory
FLOWing Epigenetic Robots and Systems