CS-456: Project- Dyna algo model

Hello,
For the Dyna algorithm in the project (with Mountain Car environment). When we update Model(S,A) with R and S', i struggle to see what should be done. Is there a specific dynamic update we need to compute for the probabilities and rewards ? Or do we directly assign R to R(S,A) ? Same question for probability. In other words is it similar to policy updates in AlphaZero for example ?

Thank you

Re: Project- Dyna algo model

by Lucas Louis Gruaz - Monday, 6 May 2024, 13:28

Hello,
As it is explained in section "4.2 Model building" of the pdf, your agent will build a (simple) model of the reward and transition probabilities. The agent's model should capture the expected probability of each transition and reward, given a state and an action. Since the rewards and transitions may be stochastic, this estimation must be updated with each new observation.

I hope this answers your question.