[Not a TA] So in DDPG the action is chosen from the policy network + exploration noise, which I think would make it off policy, and hence you can use a replay buffer. I would also be interested in hearing a TA's opinion on this.
Follow the pulses of EPFL on social networks
© 2023 EPFL, all rights reserved