A detail in the proof of Lemma 6.1

A detail in the proof of Lemma 6.1

by Andreea-Alexandra Musat -
Number of replies: 3

Hello, 


Part of the proof of Lemma 6.1 is to get a bound on  \mathbb{P}(\hat{\mu}_t + \sqrt{\frac{2a}{t}} \geq \epsilon ) (*). 

In the script (page 68), it says that "for  t no more than  \frac{2a}{\epsilon^2} , this probability is very close to 1 ". 

If  t \leq \frac{2a}{\epsilon^2} , then  \sqrt{\frac{2a}{t}} \geq \epsilon , so the probability above (*) becomes  \mathbb{P}( \hat{\mu}_t \geq 0) .

 However, I don't see why this is very close to 1. Could you help me with an explanation?


Cheers,

Andreea

In reply to Andreea-Alexandra Musat

Re: A detail in the proof of Lemma 6.1

by Thomas Weinberger -
Hello Andreea,

Good question. It seems like in this step it is implicitly assumed that the rewards X_i are non-negative, which however is wlog. (by recentering argument, i.e., adding the same constant to the rewards of all arms, such that they are all non-negative w.p. 1). If this is assumed, then we have also that \hat{mu}_t is non-negative w.p. 1.

From a quick glance on the notes I can see that non-negativity of X_i is indeed not clearly stated, so I will discuss with Prof. Urbanke to perhaps add a remark in the script.

Best,
Thomas
In reply to Thomas Weinberger

Re: A detail in the proof of Lemma 6.1

by Andreea-Alexandra Musat -
Great, thank you very much for your reply!
In reply to Andreea-Alexandra Musat

Re: A detail in the proof of Lemma 6.1

by Thomas Weinberger -
Update: I was a bit imprecise with my centering argument: of course we cannot assume non-negativity of the RVs themselves (we cannot "center" unbounded RVs like Gaussians), but we can make their expectations non-negative wlog. . Then we have that \mathbb{P}(\hat{\mu}_k- \mu_k \geq \epsilon) \leq \mathbb{P}(\hat{\mu}_k \geq \epsilon)\leq \exp(- \dots).