Good afternoon!
In the description of DQN (3.1) it is mentioned, that we need to implement the basic version of DQN. On the lectures we discussed, that this approach implies predictor and target networks, but in this project such concept is only introduced in the part 3.4.
Does it mean that we have to implement DQN in 3.1 without target network?
Hello,
In 3.1 to 3.3, it is possible to implement either one single policy network, or two networks (a policy and a target network). Both should work.
In 3.4, we ask to add two additional networks (a predictor and a "target") to the agent for computing the RND reward. The target network of parts 3.1 to 3.3 should not be confused with the target network of RND. In total, you may have either 3 or 4 networks to solve part 3.4.
In 3.1 to 3.3, it is possible to implement either one single policy network, or two networks (a policy and a target network). Both should work.
In 3.4, we ask to add two additional networks (a predictor and a "target") to the agent for computing the RND reward. The target network of parts 3.1 to 3.3 should not be confused with the target network of RND. In total, you may have either 3 or 4 networks to solve part 3.4.