Project- Dyna algo model

Project- Dyna algo model

by Alexi Semiz -
Number of replies: 1

Hello,
For the Dyna algorithm in the project (with Mountain Car environment). When we update Model(S,A) with R and S', i struggle to see what should be done. Is there a specific dynamic update we need to compute for the probabilities and rewards ? Or do we directly assign R to R(S,A) ? Same question for probability. In other words is it similar to policy updates in AlphaZero for example ?

Thank you 

In reply to Alexi Semiz

Re: Project- Dyna algo model

by Lucas Louis Gruaz -
Hello,
As it is explained in section "4.2 Model building" of the pdf, your agent will build a (simple) model of the reward and transition probabilities. The agent's model should capture the expected probability of each transition and reward, given a state and an action. Since the rewards and transitions may be stochastic, this estimation must be updated with each new observation.

I hope this answers your question.