TY - UNPB
T1 - Using Genetic Programming to Improve Data Collection for Offline Reinforcement Learning
AU - Halder, David Roman
AU - Bação, Fernando
AU - Douzas, Georgios
N1 - info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04152%2F2020/PT#
https://doi.org/10.54499/UIDB/04152/2020#
Halder, D. R., Bação, F., & Douzas, G. (2024). Using Genetic Programming to Improve Data Collection for Offline Reinforcement Learning. (pp. 1-73). Social Science Research Network (SSRN), Elsevier. https://doi.org/10.2139/ssrn.4980054 --- This work was partially supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS)
PY - 2024/10/8
Y1 - 2024/10/8
N2 - Offline Reinforcement Learning (RL) learns policies from fixed pre-collected datasets, making it applicable to use-cases where data collection is risky. Consequently, the performance of these offline-learners is highly dependent on the dataset. Still the questions of how this data is collected and what characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline-learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline-learners showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP with offline-learners, suggesting a promising avenue for future research in optimizing data collection for RL.
AB - Offline Reinforcement Learning (RL) learns policies from fixed pre-collected datasets, making it applicable to use-cases where data collection is risky. Consequently, the performance of these offline-learners is highly dependent on the dataset. Still the questions of how this data is collected and what characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline-learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline-learners showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP with offline-learners, suggesting a promising avenue for future research in optimizing data collection for RL.
KW - Offline Reinforcement Learning
KW - Genetic Programming
KW - Evolutionary Reinforcement Learning
KW - Evolutionary Algorithms
KW - Data Efficiency
UR - https://run.unl.pt/handle/10362/174691
U2 - 10.2139/ssrn.4980054
DO - 10.2139/ssrn.4980054
M3 - Preprint
SP - 1
EP - 73
BT - Using Genetic Programming to Improve Data Collection for Offline Reinforcement Learning
PB - Social Science Research Network (SSRN), Elsevier
ER -