Problem with loss for project DQN

Problem with loss for project DQN

par Lucas Antoine Reymond,
Nombre de réponses : 8

Hello,

We have implemented the Question 3.a for the DQN project, trained the network for 500 episodes with parameters given in the .pdf : We use a 3 hidden layer fully connected neural net with layers of size input size, 64, 32, 16, output size, the learning rate is 5 · 10−3, the discount factor is 0.9, the batch size is 2048 and the buffer size is 20000. The epsilon is 0.7.

We are still stuck with a network that is not learning. We plotted the cumulative reward (over the 30 weeks) for each episode and the loss over the 500 episodes and see that the loss is increasing too much. We tried with the loss functions F.mse_loss and F.smooth_l1_loss and obtain something similar in both cases. Here are the plots of our cumulative rewards and loss.

Has anyone encountered this problem and managed to fix it? Or could someone please give us some advice on how to solve this problem? 

Thanks a lot!

loss

En réponse à Lucas Antoine Reymond

Re: Problem with loss for project DQN

par Paul Charles Jacques Boulenger,
Hi !
Did you remove the clip_grad_value_ line that is in the provided PyTorch example ?
I had a similar problem and removing this line (or modifying it) kinda solved the problem.
En réponse à Lucas Antoine Reymond

Re: Problem with loss for project DQN

par Titouan Alexis Arthur Renard,
A lot of different things can cause such a problem when implementing DQN, but essentially what this indicates is that "the algorithm fails a fitting the data". So somehow Q-values estimations are becoming more and more wrong. My two best guesses about what this could be (but this is by no means an exhaustive list of possibilities I'm afraid) are the following:
1. Your loss is wrong, something is incorrect in the way your loss is computed and hence your computing a quantity which is not Q-values. Therefore your algorithm does not converge to anything stable and your loss blows up. Note that it might be wrong in a tricky way, (the line where you compute the loss might look right and still compute the wrong value), a common source of such mistakes is automatic broadcasting working in a way which you did not intend.
2. Your data is not correctly processed. Something which can lead to such a loss blow-up is mistakes in the sampling of the data. For instance a very common thing that I have seen happen is just giving the wrong reward in the loss (as in you use a mix up the state-action-reward tuples and pass rewards associated with the wrong state-action-pairs when computing the loss) this can also happen because of automatic broadcasting working in mysterious unintended ways.
Sorry I'm not able to provide a nice and concise solution to your problem but I think these two leads have a chance of solving your problem. Best luck !
En réponse à Titouan Alexis Arthur Renard

Re: Problem with loss for project DQN

par Lucas Antoine Reymond,
Thank you for the help! We fixed it and it works now.
En réponse à Lucas Antoine Reymond

Problem with loss for project DQN

par Eva Cramatte,
We have the same problem with the loss. Could you explain to me how you did it, please?
En réponse à Eva Cramatte

Re: Problem with loss for project DQN

par Lucas Antoine Reymond,
The way we processed the data was wrong, we were not using the correct observation when optimizing. I hope it can help!
En réponse à Lucas Antoine Reymond

Re: Problem with loss for project DQN

par Camille Valentine Cathala,
Hello,
I am still having the same problem. I don't see where the processing of the observations can be wrong. Was it in the states you use for computing the Q-values ?
Thanks in advance
En réponse à Camille Valentine Cathala

Re: Problem with loss for project DQN

par Lucas Antoine Reymond,
Hello,
The mistake we had was that we put the wrong observation in the buffer. Over the 30 weeks we were always giving the observation from the 1st week to the buffer in the learning step. After fixing this mistake, the loss looked better.