Hello,
For the Dyna algorithm in the project (with Mountain Car environment). When we update Model(S,A) with R and S', i struggle to see what should be done. Is there a specific dynamic update we need to compute for the probabilities and rewards ? Or do we directly assign R to R(S,A) ? Same question for probability. In other words is it similar to policy updates in AlphaZero for example ?
Thank you