COM-406: Problem 3: Epsilon-Greedy Algorithm - HMW 4

Hello,

I would like to know if my expression for the worst-case regret is correct. If I call $I_t$ the Bernoulli random of parameter $\epsilon_t$ , describing the toss of the coin at round $t$ , then :

$R_t=t\mu^*(t) -\sum_{i=1}^{t}\Bbb{E}[X_i^{J_i}]$

where $J_i$ is the random variable that determines at round $i$ , which arm I choose and $\mu^*(t)=\max_{1\leq j\leq t}\mu_j$ , kwoning that I call $X_i^{k}$ the random variable that I get at roung $i$ from arm $1\leq k\leq K$ . Then the goal is to compute $\Bbb{E}[X_i^{J_i}]$ . My idea is to condition on the values of $I_i$ :

$\Bbb{E}[X_i^{J_i}]=\Bbb{E}[X_i^{J_i} | I_i=1 ]\epsilon_i + \Bbb{E}[X_i^{J_i} | I_i=0 ](1-\epsilon_i)$

One has $\Bbb{E}[X_i^{J_i} | I_i=1]=\sum_{k=1}^{K}\Bbb{E}[X_i^k](1/K)$ since if at round $i$ the toss is a success, then I choose uniformly over all the $K$ arms. Now $\Bbb{E}[X_i^{J_i}|I_i=0]=\Bbb{E}[X_i^{t^*}]$ where $t^*:=\text{argmax}_{1\leq j\leq t}\hat{\mu}_j$ .

So with these two results

$R_t=t\mu^*(t)-\sum_{i=1}^{t}\sum_{k=1}^{K}(\mu_k/K)\epsilon_i-\sum_{i=1}^{t}\Bbb{E}[X_i^{t^*}](1-\epsilon_i)$ .

I would like to know if my reasoning so far is correct, any help would be appreciated, thank you! In particular I don't really now what to do with the last term $\sum_{i=1}^{t}\Bbb{E}[X_i^{t^*}](1-\epsilon_i)$ .

Re: Problem 3: Epsilon-Greedy Algorithm - HMW 4

par Thomas Weinberger, dimanche, 6 novembre 2022, 21:19

Dear Gabin,

Your approach looks reasonable! Regarding the last term: try to write down each

$\mathbb{E}[X_{i}^t*}]$ explicitly as a sum of products (regret when choosing arm

$1 \leq t \leq k$ times the probability of arm

$t^*$ looking the best). Moreover, note that the latter probability can be upper bounded by the prob. that arm

$t^*$ looks better than just the optimal arm (arm 1 by convention).

You might also want to take a look at my recent post in the discussion forum.

Best,
Thomas