Hi!
Just wanted to clarify something about the question 5c: for factorized agent do we have to plot the heat-map with 8 values that we get from the output layer of our network or with 16 Q-values which we then compute for actions that can actually be taken?