Reinforcement Learning with Adaptive Representation Learning

From ISLAB/CAISR
Revision as of 14:35, 5 October 2020 by Peyman (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Title Reinforcement Learning with Adaptive Representation Learning
Summary String representation "In reinforcemen … solving a task." is too long.
Keywords Reinforcement Learning, Representation Learning, Deep Learning
TimeFrame
References IS A GOOD REPRESENTATION SUFFICIENT FOR SAMPLE EFFICIENT REINFORCEMENT LEARNING?, Simon S. Du, Sham M. Kakade, 2020

Learning State Representations for Query Optimization with Deep Reinforcement Learning, Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi, 2018

State Representation Learning for Control: An Overview, Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, and David Filliat, 2018

Prerequisites
Author
Supervisor Alexander Galozy, Peyman Mashhadi
Level Master
Status Open


This project targets finding representations that make the reinforcement learning more efficient in terms of finding an easier state to action mapping. As a concrete example, let’s take the task of fitness of a person, and assume that the data is received in the form of images. Images are high dimensional data which can take many different states. This large state space would make it difficult to find an optimal action for the task in a reasonable amount of time. However, let’s imagine that we could convert those images into another representation that extract certain features like weight, highs, muscle mass, and similar important features for fitness evaluation. If we could find those features, then finding optimal actions would be much easier. The goal of this project is actually to learn the representation of incoming data in a sequential manner into a much simpler and more informative representation for the task at hand.

In reinforcement learning, most of the time, the state representation and actions are fixed and only the probabilities of the right action given the current state are changed over time. However, in this research, the representation is subject to being updated, as we learn what features are more important for solving a task. As a concrete example, one way to approach it is to have a have an attention mechanism on the features, selectively taking features into account that maximize cumulative reward. Another approach could be to have an encoder and transform the representations into bottleneck representation which can provide the new states. Then, based on the action in the new state-action space, a new reward will be calculated and the reward is used to backpropagate and update the representation.