Hello,
I have some trouble understanding why the solution of the exam of 2022, question 4) (iii) has a negative sign. Does that contradict the algorithm of Barto and Sutton given in class ?
I feel like if delta is positive, it means that the agent did something that gave a bigger reward than anticipated, so we want to ascend the gradient of that particular action.
Thank you