Human-Robot Collaboration

Active Inverse Reinforcement Learning

Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at arbitrary states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.

Active Learning for Reward Estimation in Inverse Reinforcement Learning,Manuel Lopes, Francisco Melo and Luis Montesano. European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009. (pdf)

Simultaneous Acquisition of Task and Feedback Models

We present a system to learn task representations from ambiguous feedback. We consider an inverse reinforcement learner that receives feedback from a teacher with an unknown and noisy protocol. The system needs to estimate simultaneously what the task is (i.e. how to find a compact representation to the task goal), and how the teacher is providing the feedback. We further explore the problem of ambiguous protocols by considering that the words used by the teacher have an unknown relation with the action and meaning expected by the robot. This allows the system to start with a set of known signs and learn the meaning of new ones. We present computational results that show that it is possible to learn the task under a noisy and ambiguous feedback. Using an active learning approach, the system is able to reduce the length of the training period.

Robot Learning Simultaneously a Task and How to Interpret Human Instructions, Jonathan Grizou, Manuel Lopes and Pierre-Yves Oudeyer. ICDL-Epirob, Osaka, Japan, 2013. (pdf)

Simultaneous Acquisition of Task and Feedback Models, Manuel Lopes, Thomas Cederborg and Pierre-Yves Oudeyer. IEEE - International Conference on Development and Learning (ICDL), Germany, 2011. (pdf)

Robot Self-Initiative and Personalization by Learning through Repeated Interactions

We have developed a robotic system that interacts with the user, and through repeated interactions, adapts to the user so that the system becomes semi-autonomous and acts proactively. In this work we show how to design a system to meet a user’s preferences, show how robot pro-activity can be learned and provide an integrated system using verbal instructions. All these behaviors are implemented in a real platform that achieves all these behaviors and is evaluated in terms of user acceptability and efficiency of interaction.

A demonstration of a sequence of actions (video).

Robot Self-Initiative and Personalization by Learning through Repeated Interactions, Martin Mason and Manuel Lopes. 6th ACM/IEEE International Conference on Human-Robot (HRI’11), Lausanne, Switzerland, 2011. (pdf)

Task Inference and Imitation

Baltazar grasping In this demo the robot is shown a demonstration of a task. It knows how to grasp, how to tap, and by knowing the object affordances it is able to recognize such actions from the effects on objects. After observing the demonstration and segment it in its components it needs to infer the goal/intention of it. Inverse reinforcement learning algorithms can be used as a way to infer the reward function that better explains the observed behavior. It is interesting that different tasks (e.g. considering or not subgoals) are infered depending on different knowledges about the world or on contextual restrictions (see ).

A demonstration of a sequence of actions (video 1) is imitated by Baltazar (video 2).

Relevant publications: Lopes et al AdBeh09 (pdf) Lopes et al IROS07(pdf)

Affordance based imitation

It is possible to imitate (parts) of the demonstration withouth having to completely track and recongize all components in a demonstration. It is interesting to see that to copy actions applied on objects it is not necessary to follow the hand of the demonstrator. If the affordances of objects are know/learned then we can create a action-object-effect statistical relation. To recognize a given action, or plan the action that achieves a given effect, a statistical inference can be made in such model.

This experiment (video) shows how to copy the effects of actions just by using the information provided by the affordances.

Affordance-based imitation learning in robots, Manuel Lopes, Francisco S. Melo and Luis Montesano. IEEE – Intelligent Robotic Systems (IROS), San Diego, USA, 2007. (pdf)

Behavioral Switching in Animals and Children

Learning by imitation is an easy way to program robots to perform tasks. To allow such capability it is necessary to have knowledge about the world properties (e.g. object affordances, world dynamics), infer the intention of the demonstrator (at least the motor intention) and to map actions and effects between different body morphologies. It is interesting to remark that all the assumptions we do relating to these processes will change the imitating behavior. This behavioral switching has been deeply studied in psychology and neurophysiology. Our paper discusses this aspects and provides a computational method that models behavioral switching experiments from psychology and is also applied to robot learning.

A Computational Model of Social-Learning Mechanisms, Manuel Lopes, Francisco S. Melo, Ben Kenward and José Santos-Victor. Adaptive Behaviour, 467(17), 2009.

Reflexive imitation based on perceptual-motor maps

This experiment shows Baltazar mimicking human gestures(movements). While seeing a gesture performed by and human (video 1) the robot process the image (video 2) in order to extract the backgournd to be able to estimate the body, shoulder and hand position. In the end, using the visuo-motor map the system is able to mimic the gesture (video 3).

Relevant publications:
Lopes et. al SMC(B)05 (pdf) Lopes et al ICRA03 (pdf)