Problem with loss for project DQN

Problem with loss for project DQN

by Lucas Antoine Reymond -
Number of replies: 8

Hello,

We have implemented the Question 3.a for the DQN project, trained the network for 500 episodes with parameters given in the .pdf : We use a 3 hidden layer fully connected neural net with layers of size input size, 64, 32, 16, output size, the learning rate is 5 · 10−3, the discount factor is 0.9, the batch size is 2048 and the buffer size is 20000. The epsilon is 0.7.

We are still stuck with a network that is not learning. We plotted the cumulative reward (over the 30 weeks) for each episode and the loss over the 500 episodes and see that the loss is increasing too much. We tried with the loss functions F.mse_loss and F.smooth_l1_loss and obtain something similar in both cases. Here are the plots of our cumulative rewards and loss.

Has anyone encountered this problem and managed to fix it? Or could someone please give us some advice on how to solve this problem? 

Thanks a lot!

loss

In reply to Lucas Antoine Reymond

Re: Problem with loss for project DQN

by Paul Charles Jacques Boulenger -
Hi !
Did you remove the clip_grad_value_ line that is in the provided PyTorch example ?
I had a similar problem and removing this line (or modifying it) kinda solved the problem.
In reply to Paul Charles Jacques Boulenger

Re: Problem with loss for project DQN

by Lucas Antoine Reymond -
Yes, we tried with and without this line and we get almost the same loss.
In reply to Lucas Antoine Reymond

Re: Problem with loss for project DQN

by Titouan Alexis Arthur Renard -
A lot of different things can cause such a problem when implementing DQN, but essentially what this indicates is that "the algorithm fails a fitting the data". So somehow Q-values estimations are becoming more and more wrong. My two best guesses about what this could be (but this is by no means an exhaustive list of possibilities I'm afraid) are the following:
1. Your loss is wrong, something is incorrect in the way your loss is computed and hence your computing a quantity which is not Q-values. Therefore your algorithm does not converge to anything stable and your loss blows up. Note that it might be wrong in a tricky way, (the line where you compute the loss might look right and still compute the wrong value), a common source of such mistakes is automatic broadcasting working in a way which you did not intend.
2. Your data is not correctly processed. Something which can lead to such a loss blow-up is mistakes in the sampling of the data. For instance a very common thing that I have seen happen is just giving the wrong reward in the loss (as in you use a mix up the state-action-reward tuples and pass rewards associated with the wrong state-action-pairs when computing the loss) this can also happen because of automatic broadcasting working in mysterious unintended ways.
Sorry I'm not able to provide a nice and concise solution to your problem but I think these two leads have a chance of solving your problem. Best luck !
In reply to Titouan Alexis Arthur Renard

Re: Problem with loss for project DQN

by Lucas Antoine Reymond -
Thank you for the help! We fixed it and it works now.
In reply to Lucas Antoine Reymond

Problem with loss for project DQN

by Eva Cramatte -
We have the same problem with the loss. Could you explain to me how you did it, please?
In reply to Eva Cramatte

Re: Problem with loss for project DQN

by Lucas Antoine Reymond -
The way we processed the data was wrong, we were not using the correct observation when optimizing. I hope it can help!
In reply to Lucas Antoine Reymond

Re: Problem with loss for project DQN

by Camille Valentine Cathala -
Hello,
I am still having the same problem. I don't see where the processing of the observations can be wrong. Was it in the states you use for computing the Q-values ?
Thanks in advance
In reply to Camille Valentine Cathala

Re: Problem with loss for project DQN

by Lucas Antoine Reymond -
Hello,
The mistake we had was that we put the wrong observation in the buffer. Over the 30 weeks we were always giving the observation from the 1st week to the buffer in the learning step. After fixing this mistake, the loss looked better.