Hello,
1. The idea is that the transition (state, action, next_state, reward) will be generated by the environment, and the observe function defines what the agent does upon observing the transition.
2. Yes, you should create different classes.
I hope it's clear, tell me otherwise.
1. The idea is that the transition (state, action, next_state, reward) will be generated by the environment, and the observe function defines what the agent does upon observing the transition.
2. Yes, you should create different classes.
I hope it's clear, tell me otherwise.