Final 2021, exo 4.b

Final 2021, exo 4.b

par Marina Rapellini,
Nombre de réponses : 5

Hi

Could you please explain how to obtain the value of m that maximises the bound of E[R]? And then how do we obtain the final bound for E[R]?

Thank you

En réponse à Marina Rapellini

Re: Final 2021, exo 4.b

par Thomas Weinberger,
Hello Marina,

Due to the imposed Lipschitzness of the reward we have that the reward around each grid point i \delta - \delta/2 changes at most linearly in n. Moreover, since the grid points are spaced at a distance of 1/m apart, we can approximate the regret within each interval around a given grid point by the regret at grid point itself, with being off by at most \mathcal{O}(n/m).

The only thing that remains is then to add approximation error to the bound obtained in i), plug in |C|=m and optimize for m (set derivative wrt. m equal to 0).

Cheers,
Thomas
En réponse à Thomas Weinberger

Re: Final 2021, exo 4.b

par Thomas Weinberger,
Nevermind, sqrt(m) is concave so setting the first derivative = 0 doesnt work. I will have another look later!
En réponse à Thomas Weinberger

Re: Final 2021, exo 4.b

par Thomas Weinberger,
Hi Marina,

My first guess was actually correct. Applying the first-order optimality condition is permitted here because, asymptotically (for n \rightarrow \infty), the term \mathcal{O}(n/m) (which is convex!) dominates. Hence the objective function is (informally speaking) "asymptotically convex" and setting the first derivative to zero is sufficient and necessary for obtaining the (asymptotically) optimal choice of m.

Best,
Thomas
En réponse à Thomas Weinberger

Final 2021, exo 4.b

par Marina Rapellini,
Hi
I am trying again to do this computation but I am not able to compute the derivative. Can you help me out? Thank you
En réponse à Marina Rapellini

Re: Final 2021, exo 4.b

par Thomas Weinberger,
Dear Marina,

Ignoring the big-O notation for a moment, the first derivative is equal to \frac{1}{2}\sqrt{ \frac{k n \log(n)} {m} } - \frac{n}{m^2} . Setting this to zero and solving for m, you obtain the optimal choice for m (as I explained above, the first order optimality condition holds even though the function is strictly speaking not convex).

Best,

Thomas