CS-456: Clraifications in Question 5 in DQN

Alright so regarding
5a) you do have an evaluation of Russo (the histograms from question 2b) you should compare the performance of this policy to the learned policies, but of course since Russo is not learned you do not need to compare it's training behavior, this is only relevant for the DQN policies.
5c) here, what we expect is somewhere in between your two suggestions, we consider number of actions that can result of the output of the neural net at a given time-step. So for the binary and toggled policies it is indeed 2, and then 5, but for the factored agent, because of the way actions are factored together the number of actions is 16 as the neural net can give outputs that can result in any combination of actions (which is not the case for the toggled scenario as technically the neural net can only pick 5 different actions at any given time step).

I hope this helps,
Titouan

ANN Forum

Clraifications in Question 5 in DQN

Re: Clraifications in Question 5 in DQN