Mp2-3.4: Training batch when the episode is done