Hi,
remember that V_theta(s_0) only depends on s_0 (to compute the V-value, we already took the expectation over the state action sequence sampled by theta, conditioned on the initial state s_0). That means that when we add the state-action-sequence sampled by theta' in the expectation of V_theta(s_0) (going from Eq. (4) to Eq. (5)), this does not affect our expectation of V_theta(s_0).
Hope it's more clear now!
Best,
Your TAs
remember that V_theta(s_0) only depends on s_0 (to compute the V-value, we already took the expectation over the state action sequence sampled by theta, conditioned on the initial state s_0). That means that when we add the state-action-sequence sampled by theta' in the expectation of V_theta(s_0) (going from Eq. (4) to Eq. (5)), this does not affect our expectation of V_theta(s_0).
Hope it's more clear now!
Best,
Your TAs