COM-406: Homework 4, Problem 4

Hi,

I am stuck on the exp3 algorithm problem. I am struggling to find a meaningful expression for the expected reward in i) that has a constant and higher order terms in 1/n. Intuitively, the exp3 algorithm always eventually picks out the arm with the highest average rewards, but where do I start with an expression?

Thank you :)

Re: Homework 4, Problem 4

by Thomas Weinberger - Sunday, 6 November 2022, 20:58

Dear Paula,

Please see my post in the discussion forum. You do not need to provide any expression, as only the asymptotic reward is relevant. The exp3 algorithm is indeed capable of identifying the optimal arm, hence the adversary should design its strategy accordingly. Just state this (simple) strategy and then give a numerical value of the expected regret, also neglecting higher order terms.

Best,
Thomas