Videos 4
1. First steps toward deep reinforcement learning
2. Basic idea of policy gradient
3. Example: Binary actor with 1-step horizon
4A . From batch to online: Log-likelihood trick
4B. Example (1-step horizon) revisited
4*. Quiz - Policy Gradient Methods
5. Policy gradient over Multiple time steps
6. Subtracting the mean reward via the value function
6*. Quiz
- Contact
- EPFL CH-1015 Lausanne
- +41 21 693 11 11
Follow the pulses of EPFL on social networks
© 2023 EPFL, all rights reserved