Reinforcement Learning with Adaptive Representation Learning

Title	Reinforcement Learning with Adaptive Representation Learning
Summary	This project targets finding representations that make the reinforcement learning more efficient in terms of finding an easier state to action mapping.
Keywords	Reinforcement Learning, Representation Learning, Deep Learning
TimeFrame
References	IS A GOOD REPRESENTATION SUFFICIENT FOR SAMPLE EFFICIENT REINFORCEMENT LEARNING?, Simon S. Du, Sham M. Kakade, 2020 Learning State Representations for Query Optimization with Deep Reinforcement Learning, Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi, 2018 State Representation Learning for Control: An Overview, Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, and David Filliat, 2018
Prerequisites
Author
Supervisor	Alexander Galozy, Peyman Mashhadi
Level	Master
Status	Open

This project targets finding representations that make the reinforcement learning more efficient in terms of finding an easier state to action mapping. As a concrete example, let’s take the task of fitness of a person, and assume that the data is received in the form of images. Images are high dimensional data which can take many different states. This large state space would make it difficult to find an optimal action for the task in a reasonable amount of time. However, let’s imagine that we could convert those images into another representation that extract certain features like weight, highs, muscle mass, and similar important features for fitness evaluation. If we could find those features, then finding optimal actions would be much easier. The goal of this project is actually to learn the representation of incoming data in a sequential manner into a much simpler and more informative representation for the task at hand.

In reinforcement learning, most of the time, the state representation and actions are fixed and only the probabilities of the right action given the current state are changed over time. However, in this research, the representation is subject to being updated, as we learn what features are more important for solving a task. As a concrete example, one way to approach it is to have a have an attention mechanism on the features, selectively taking features into account that maximize cumulative reward. Another approach could be to have an encoder and transform the representations into bottleneck representation which can provide the new states. Then, based on the action in the new state-action space, a new reward will be calculated and the reward is used to backpropagate and update the representation.

Reinforcement Learning with Adaptive Representation Learning

Navigation menu

Views

Personal tools

Search

Tools