Videos 4 | Moodle

1. First steps toward deep reinforcement learning

2. Basic idea of policy gradient

3. Example: Binary actor with 1-step horizon

4A . From batch to online: Log-likelihood trick

4B. Example (1-step horizon) revisited

4*. Quiz - Policy Gradient Methods

5. Policy gradient over Multiple time steps

6. Subtracting the mean reward via the value function

6*. Quiz

Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL

No entries found in this section

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Follow the pulses of EPFL on social networks

© 2023 EPFL, all rights reserved