CS-456: MP1 - DQN implementation

Hello,
In 3.1 to 3.3, it is possible to implement either one single policy network, or two networks (a policy and a target network). Both should work.
In 3.4, we ask to add two additional networks (a predictor and a "target") to the agent for computing the RND reward. The target network of parts 3.1 to 3.3 should not be confused with the target network of RND. In total, you may have either 3 or 4 networks to solve part 3.4.

ANN Forum

MP1 - DQN implementation

Re: MP1 - DQN implementation