Hello,
In the project description, it is said we need to create an Agent Class with the following functions:
- observe(self, state, action, next_state, reward) : called upon observing a new transition of the environment.
- select_action(self, state) : pick an action from the given state.- update(self) : called after each environment step. This is where all the training takes place.
1. I am not sure how observe function should work. If we already need to pass state, action, next_state, and reward, what does it even observe?
2. Should we create different classes for different algorithms (DQN, Dyna, and Random) as the update and select_action functions are to be implemented differently for each one of them?
Thank you.