Improving Option Learning with Hindsight Experience Replay

Gabriel Romio; Mateus Begnini Melchiades; Gabriel de Oliveira Ramos

Improving Option Learning with Hindsight Experience Replay

Gabriel Romio, Mateus Begnini Melchiades, Gabriel de Oliveira Ramos

Published: 01 Apr 2025, Last Modified: 01 May 2025ALAEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Sparse Rewards, Multi-Goal Environments, Options Framework, Temporal Abstraction

TL;DR: We integrate Hindsight Experience Replay (HER) into the Multi-updates Option Critic (MOC) framework to enhance option learning in multi-goal environments with sparse rewards, surpassing standard MOC in both sparse and dense reward settings.

Abstract: Algorithms such as Option-Critic (OC) and Multi-updates Option Critic (MOC) have introduced significant advancements in the discovery and autonomous learning of options. However, these methods still tend to underperform in multi-goal environments or those with sparse rewards. In this work, we propose the integration of Hindsight Experience Replay (HER) into MOC to enhance performance in these scenarios. To achieve this, the algorithm selects new goals based on previously reached states. The rewards for already completed iterations are then recalculated, leveraging even unsuccessful trajectories as if the intended objective had been achieved. Our method, which we refer to as MOC-HER, successfully solved multi-goal environments with sparse rewards, where traditional Hierarchical Reinforcement Learning algorithms failed. Additionally, when testing our algorithm in the same environments with dense rewards, we observed significant improvements over the original MOC.

Type Of Paper: Full paper (max page 8)

Anonymous Submission: Anonymized submission.

Submission Number: 7

Loading