Vihang P. Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, and Sepp Hochreiter

Reward redistribution in Align-RUDDER
Align-RUDDER: the steps of reward redistribution.

We earlier developed RUDDER, a new method for model-free reinforcement learning (RL) with delayed rewards. RUDDER solves complex RL tasks with sparse and delayed rewards by reward redistribution that is obtained via return decomposition. RUDDER replaces the expected future rewards with the immediate rewards by redistributing them to the key events associated with changes in return expectations.

Here, we introduce a modified method, Align-RUDDER, which facilitates discovering the key events with high rewards. To this end, Align-RUDDER uses profile models known from bioinformatics, instead of LSTM networks. LSTM networks require a large number of examples for learning and often struggle to explore the relevant events. Align-RUDDER learns events with high rewards from demonstrations instead of exploration, and only requires few demonstrations. From multiple sequence alignment of the demonstrations, a profile model is obtained. State-action sequences are aligned to the profile model and receive scores based on their similarity. From the differences in the alignment scores of the consecutive sequences, the state-action pairs with high rewards are identified and rewards are redistributed to them.  This allows adjusting the policy so that the relevant events are achieved more often, thus increasing the return. 

Implementation of Align-RUDDER is available on GitHub.

arXiv:2009.14108, 2020-09-29.

Download
View paper
IARAI Authors
Dr Sepp Hochreiter
Research
Reinforcement Learning
Keywords
Delayed Reward, Multiple Sequence Alignment, Reinforcement Learning, Reward Redistribution

©2023 IARAI - INSTITUTE OF ADVANCED RESEARCH IN ARTIFICIAL INTELLIGENCE

Imprint | Privacy Policy

Stay in the know with developments at IARAI

We can let you know if there’s any

updates from the Institute.
You can later also tailor your news feed to specific research areas or keywords (Privacy)
Loading

Log in with your credentials

or    

Forgot your details?

Create Account