Projects: Reinforcement Learning (RL)

The goal of Reinforcement Learning (RL) is to build learning agents that are connected to their environments through perception and action. We use RL in two different kinds of problems. In one kind of problem, the agent explores the environment and learns to act. In the other, an expert provides training to the agent and the agent learns to mimic the expert (imitation learning and inverse reinforcement learning).

Imitation Learning

It is common knowledge that both humans and animals learn new skills by observing others. This problem, called imitation learning, can be formulated as learning a representation of a policy (a mapping from states to actions) from examples of that policy. Imitation learning has a long history in machine learning and has been studied under a variety of names including learning by observation, learning from demonstrations, programming by demonstrations, programming by example, apprenticeship learning, behavioral cloning, learning to act, and some others. Our focus is on relational domains where states are naturally described by relations among an indefinite number of objects. We focus on Wargus, a real-time strategy game in which players must control their units in real time, as a complex testbed for this research. In the video below, the objective is to have at least one tower still standing at the end of the game. The red team is controlled by the learner that has learned from expert trajectories while the blue team is played by the default AI engine. We use functional gradient boosting for learning in order to imitate the expert.

  • Members involved:
  • Kristian Kersting, Saket Joshi, Phillip Odom, and Sriraam Natarajan
  • Other collaborators:
  • Dr. Jude Shavlik (University of Wisconsin, Madison) and Dr. Prasad Tadepalli (Oregon State University)
  • Publications:
  • Sriraam Natarajan, Saket Joshi, Prasad Tadepalli, Kristian Kersting, and Jude Shavlik. Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , International Joint Conference in AI (IJCAI) 2011.

Inverse Reinforcement Learning (IRL)

The goal of IRL is to observe an agent acting in the environment and determine the reward function that the agent is optimizing. The observations include the agent's behavior over time, the measurements of the agent's sensory inputs, and the model of the environment. We consider the problem of IRL in different settings: multi-agent, relational domains, and problems with a small number of trajectories given expert advice.
  • Members involved:
  • Kristian Kersting, Gautam Kunapuli, and Sriraam Natarajan
  • Other collaborators:
  • Dr. Jude Shavlik (University of Wisconsin, Madison) and Dr. Prasad Tadepalli (Oregon State University)
  • Publications:
  • Sriraam Natarajan, Gautam Kunapuli, Kshitij Judah, Prasad Tadepalli, Kristian Kersting and Jude Shavlik. Multi Agent Inverse Reinforcement Learning, IEEE Conference on Machine Learning and Applications (ICMLA) 2010.

Hierarchical Models

We also consider the problem of imitation learning in the presence of a hierarchical policy structure for the expert in relational domains. Our research focuses on two different kinds of problems: one in which the hierarchical structure is known, and the other in which the user hierarchy must be induced from trajectories.
  • Members involved:
  • Saket Joshi, Phillip Odom, and Sriraam Natarajan
  • Other collaborators:
  • Mandana Hamidi (Oregon State University) and Dr. Prasad Tadepalli (Oregon State University)
  • Publications to be updated soon.