Final 2021, exo 4.b

Final 2021, exo 4.b

by Marina Rapellini -
Number of replies: 5

Hi

Could you please explain how to obtain the value of m that maximises the bound of E[R]? And then how do we obtain the final bound for E[R]?

Thank you

In reply to Marina Rapellini

Re: Final 2021, exo 4.b

by Thomas Weinberger -
Hello Marina,

Due to the imposed Lipschitzness of the reward we have that the reward around each grid point i \delta - \delta/2 changes at most linearly in n. Moreover, since the grid points are spaced at a distance of 1/m apart, we can approximate the regret within each interval around a given grid point by the regret at grid point itself, with being off by at most \mathcal{O}(n/m).

The only thing that remains is then to add approximation error to the bound obtained in i), plug in |C|=m and optimize for m (set derivative wrt. m equal to 0).

Cheers,
Thomas
In reply to Thomas Weinberger

Re: Final 2021, exo 4.b

by Thomas Weinberger -
Nevermind, sqrt(m) is concave so setting the first derivative = 0 doesnt work. I will have another look later!
In reply to Thomas Weinberger

Re: Final 2021, exo 4.b

by Thomas Weinberger -
Hi Marina,

My first guess was actually correct. Applying the first-order optimality condition is permitted here because, asymptotically (for n \rightarrow \infty), the term \mathcal{O}(n/m) (which is convex!) dominates. Hence the objective function is (informally speaking) "asymptotically convex" and setting the first derivative to zero is sufficient and necessary for obtaining the (asymptotically) optimal choice of m.

Best,
Thomas
In reply to Thomas Weinberger

Final 2021, exo 4.b

by Marina Rapellini -
Hi
I am trying again to do this computation but I am not able to compute the derivative. Can you help me out? Thank you
In reply to Marina Rapellini

Re: Final 2021, exo 4.b

by Thomas Weinberger -
Dear Marina,

Ignoring the big-O notation for a moment, the first derivative is equal to \frac{1}{2}\sqrt{ \frac{k n \log(n)} {m} } - \frac{n}{m^2} . Setting this to zero and solving for m, you obtain the optimal choice for m (as I explained above, the first order optimality condition holds even though the function is strictly speaking not convex).

Best,

Thomas