CS-456: Nim deep QL convergence speed

Hello,

I didn't manage to find anyone that does Nim to get feedbacks on their results, so I'm asking here. We are experiencing similar final results to q-learning when doing deep ql, but the convergence speed is much slower. Self-learning takes around double the number of games that normal dq learning takes to reach "plateau", just as it happens with q-learning, and the shape of the M-opt curve in the plots is similar as well bewteen ql and deep ql, it's simply much slower. Basically, it doesn't reach plateau with dql self learning in 20k games, it does after 25k games ( and it reaches the plateau after 12k games with normal dq-learning). Is this normal? We are using the hyperparameteres recommended in the project guidelines, with dqn agent's epsilon=0.1.

Thank you,

Elia Fantini