CS-456: DDPG Project: exploding gradient

It is not necessary to include any additional techniques like you mentioned to make the training errors of the Q-network+heuristic policy converge. Before you do any debugging, make sure that you run your algorithm for sufficiently many episodes - it is expected that the error initially increase quite a bit before they settle at a stable value (200 episodes of 200 steps each should be enough to see the errors decrease).

If your errors actually explode, it means that something with your implementation of the Q-network or the heuristic policy is probably wrong. Here are some common sources or errors that you can check:
1) Gradients that you forgot to remove (e.g. by wrapping in a torch.no_grad() statement)
2) Mismatch of tensor dimensions that you feed into your network (e.g. make sure you don't confuse the batch dimensions and the remaining dimensions - the gradients might still be computed, but in a wrong way, so your program doesn't throw an error)
3) Typos and small errors in the logic of the algorithm (e.g. the computations of the error, the architecture of the network, ...)

If you cannot solve the issue, make sure to ask about it in the next exercise session on Tuesday. Good luck!

ANN Forum

DDPG Project: exploding gradient

Re: DDPG Project: exploding gradient