CS-233: [HW3] Gradient Descent vs SGD

[HW3] Gradient Descent vs SGD

◄ HW3: Question 1.3
Hwk3, 1.3 ►

Hey,

Thanks for the remark. Indeed, the gradient is calculated each time at a single sample so it is SGD that is used here. The update formula is therefore, each time, with respect to an i-th observation in the dataset.

SGD update formula

Best,

Firas

◄ HW3: Question 1.3
Hwk3, 1.3 ►

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Follow the pulses of EPFL on social networks

Accessibility
Legal notice
Privacy policy

Discussion Forum

[HW3] Gradient Descent vs SGD

Re: [HW3] Gradient Descent vs SGD