[Not a TA] So in DDPG the action is chosen from the policy network + exploration noise, which I think would make it off policy, and hence you can use a replay buffer. I would also be interested in hearing a TA's opinion on this.
![DDPG](https://moodlearchive.epfl.ch/2023-2024/pluginfile.php/3065291/mod_forum/post/198755/Screenshot_35.png)
Suivre les pulsations de l'EPFL sur les réseaux sociaux
© 2023 EPFL, tous droits réservés