Problem 3: Epsilon-Greedy Algorithm - HMW 4

Problem 3: Epsilon-Greedy Algorithm - HMW 4

par Gabin Paul Jacques Leroy,
Nombre de réponses : 1

Hello,


I would like to know if my expression for the worst-case regret is correct. If I call I_t the Bernoulli random of parameter \epsilon_t, describing the toss of the coin at round t, then :


R_t=t\mu^*(t) -\sum_{i=1}^{t}\Bbb{E}[X_i^{J_i}]


where J_i is the random variable that determines at round i, which arm I choose and \mu^*(t)=\max_{1\leq j\leq t}\mu_j, kwoning that I call X_i^{k} the random variable that I get at roung $i$ from arm 1\leq k\leq K. Then the goal is to compute \Bbb{E}[X_i^{J_i}] . My idea is to condition on the values of I_i :


\Bbb{E}[X_i^{J_i}]=\Bbb{E}[X_i^{J_i} | I_i=1 ]\epsilon_i + \Bbb{E}[X_i^{J_i} | I_i=0 ](1-\epsilon_i)


One has \Bbb{E}[X_i^{J_i} | I_i=1]=\sum_{k=1}^{K}\Bbb{E}[X_i^k](1/K) since if at round i the toss is a success, then I choose uniformly over all the K arms. Now \Bbb{E}[X_i^{J_i}|I_i=0]=\Bbb{E}[X_i^{t^*}] where t^*:=\text{argmax}_{1\leq j\leq t}\hat{\mu}_j.

So with these two results


R_t=t\mu^*(t)-\sum_{i=1}^{t}\sum_{k=1}^{K}(\mu_k/K)\epsilon_i-\sum_{i=1}^{t}\Bbb{E}[X_i^{t^*}](1-\epsilon_i)


I would like to know if my reasoning so far is correct, any help would be appreciated, thank you! In particular I don't really now what to do with the last term \sum_{i=1}^{t}\Bbb{E}[X_i^{t^*}](1-\epsilon_i).

En réponse à Gabin Paul Jacques Leroy

Re: Problem 3: Epsilon-Greedy Algorithm - HMW 4

par Thomas Weinberger,
Dear Gabin,

Your approach looks reasonable! Regarding the last term: try to write down each \mathbb{E}[X_{i}^t*}] explicitly as a sum of products (regret when choosing arm 1 \leq t \leq k times the probability of arm t^* looking the best). Moreover, note that the latter probability can be upper bounded by the prob. that arm t^* looks better than just the optimal arm (arm 1 by convention).

You might also want to take a look at my recent post in the discussion forum.

Best,
Thomas