Hello,
I would like to know if my expression for the worst-case regret is correct. If I call the Bernoulli random of parameter , describing the toss of the coin at round , then :
where is the random variable that determines at round , which arm I choose and , kwoning that I call the random variable that I get at roung $i$ from arm . Then the goal is to compute . My idea is to condition on the values of :
One has since if at round the toss is a success, then I choose uniformly over all the arms. Now where .
So with these two results
I would like to know if my reasoning so far is correct, any help would be appreciated, thank you! In particular I don't really now what to do with the last term .