Mp2-3.4: Training batch when the episode is done

Mp2-3.4: Training batch when the episode is done

par Hyeongdon Moon,
Nombre de réponses : 4

Hi, we have a question about the implementation instruction in section 3.4 We are wondering if we should update the batch at the end of one experience even if the batch is not full. Given the combination with the K-worker implementation, it seems natural to fill the batch to the batch size and update it, but is this part of the documentation asking us to update when the experience is aborted? Thanks!

En réponse à Hyeongdon Moon

Re: Mp2-3.4: Training batch when the episode is done

par Skander Moalla,
Hello,

I'm not sure I understand the question. What is one "experience"? What does it mean to fill the batch?

The task is to collect n-steps from each of the K workers, resulting in a batch of K*n samples. Compute the (up-to-)n-step advantages on each sample in the batch and use of them at once to compute the gradients.
En réponse à Skander Moalla

Re: Mp2-3.4: Training batch when the episode is done

par Hyeongdon Moon,
Oh sorry for using the terms in personal implementation. Experience means one step. I just want to ask if we should create a new batch if one episode is terminated even if there are fewer than n steps inside the batch. If n=6 and episode A ends within 3 steps, and the next episode is called B, is it okay to put [A1,A2,A3,B1,B2,B3] in a single batch? or Should I separate batch into [A1,A2,A3] and [B1,B2,B3]?
En réponse à Hyeongdon Moon

Re: Mp2-3.4: Training batch when the episode is done

par Skander Moalla,
Thanks for the clarification. Yes, all the n-steps should be included even if the env is reset in between. In that case, you'd include As and Bs as you mention, and would have to bootstrap correctly so that samples from B don't end up in the value estimation of samples in As.