CS-456: DDPG | Moodle

DDPG

◄ MP2 fraud detection interviews
Format of handwritten notes in final exam ►

[Not a TA] So in DDPG the action is chosen from the policy network + exploration noise, which I think would make it off policy, and hence you can use a replay buffer. I would also be interested in hearing a TA's opinion on this.

◄ MP2 fraud detection interviews
Format of handwritten notes in final exam ►

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Follow the pulses of EPFL on social networks

Accessibility
Legal notice
Privacy policy

ANN Forum

DDPG

Re: DDPG