Hello,
I would like to know if my expression for the worst-case regret is correct. If I call the Bernoulli random of parameter
, describing the toss of the coin at round
, then :
where is the random variable that determines at round
, which arm I choose and
, kwoning that I call
the random variable that I get at roung $i$ from arm
. Then the goal is to compute
. My idea is to condition on the values of
:
One has since if at round
the toss is a success, then I choose uniformly over all the
arms. Now
where
.
So with these two results
I would like to know if my reasoning so far is correct, any help would be appreciated, thank you! In particular I don't really now what to do with the last term .