[Not a TA] So in DDPG the action is chosen from the policy network + exploration noise, which I think would make it off policy, and hence you can use a replay buffer. I would also be interested in hearing a TA's opinion on this.
Suivre les pulsations de l'EPFL sur les réseaux sociaux
© 2023 EPFL, tous droits réservés