Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey

Contributors:

Policy Search

PEPG: Parameter-exploring policy gradients, Sehnke F et al, 2010, Neural Networks.
NES: Natural evolution strategies, Wierstra D et al, 2014, The Journal of Machine Learning Research.
OpenAI-ES: Evolution strategies as a scalable alternative to reinforcement learning, Salimans T et al, 2017.
GA: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, Such F P et al, 2017.
NS-ES/NSR-ES/NSRA-ES: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, Conti E et al, 2018, NeurIPS.
TRES: Trust region evolution strategies, Liu G et al, 2019, AAAI.
Guided ES: Guided evolutionary strategies: Augmenting random search with surrogate gradients, Maheswaranathan N et al, 2019, ICML.
PBT: Population based training of neural networks, Jaderberg M et al, 2017.
PB2: Provably efficient online hyperparameter optimization with population-based bandits, Parker-Holder J et al, 2020, Advances in Neural Information Processing Systems.
SEARL: Sample-efficient automated deep reinforcement learning, Franke J K H et al, 2020.
DERL: Embodied intelligence via learning and evolution, Gupta A et al, 2021, Nature communications.

ERQL: Bootstrapping $ q $-learning for robotics from neuro-evolution results, Zimmer M et al, 2017, IEEE.
GRP-PG: Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms, Colas C et al, 2018, ICML.
ERL: Evolution-guided policy gradient in reinforcement learning, Khadka S et al, 2018, NeurIPS.
CEM-RL: CEM-RL: Combining evolutionary and gradient-based methods for policy search, Pourchot A et al, 2018.
CERL: Collaborative evolutionary reinforcement learning, Khadka S et al, 2019, ICML.
PDERL: Proximal distilled evolutionary reinforcement learning, Bodnar C et al, 2020, AAAI.
RIM: Recruitment-imitation mechanism for evolutionary reinforcement learning, Lü S et al, 2021, Information Sciences.
ESAC: Maximum mutation reinforcement learning for scalable control, Suri K et al, 2020.
QD-RL: Qd-rl: Efficient mixing of quality and diversity in reinforcement learning, Cideron G et al, 2020.
SUPE-RL: Genetic soft updates for policy evolution in deep reinforcement learning, Marchesini E et al, 2020, ICLR.

PPO-CMA: PPO-CMA: Proximal policy optimization with covariance matrix adaptation, Hämäläinen P et al, 2020, IEEE.
EPG: Evolved policy gradients, Houthooft R et al, 2018, NeurIPS.
CGP: Q-learning for continuous actions with cross-entropy guided policies, Simmons-Edler R et al, 2019.
GRAC: Grac: Self-guided and self-regularized actor-critic, Shao L et al, 2022, CoRL.