Problem 2 Final Exam 2020

Problem 2 Final Exam 2020

by Andreea-Alexandra Musat -
Number of replies: 3

Hello, 


I have a question regarding the second problem from the final exam from 2020. We are given n i.i.d. samples from a Bernoulli distribution with parameter  \mu , where we know that  \mu is between  \kappa and  1 - \kappa for some  \kappa \in [0, \frac{1}{2}] and we are asked to accurately estimate the entropy of this distribution.

1) The proposed solution is to construct an estimate  \hat{\mu}(S) of the true parameter  \mu and then to use this for computing the entropy. This should be done by taking into account that  \mu \in [\kappa, 1-\kappa] . However, the proposed estimator is  \hat{\mu(S)} = \min{ \{ \max{ \{ \kappa, \frac{1}{n} \sum_{i=1}^n X_i \} }, \frac{1}{2} \} } . I don't understand why it makes sense to have our estimator have a maximum value of  \frac{1}{2} . This means that if  \mu > \frac{1}{2} (eg:  \kappa = \frac{1}{4}, \mu = 0.7 , this is perfectly fine according to our assumptions), our estimate will never be more than  \frac{1}{2} . I guess instead we'd want  \hat{\mu(S)} = \min{ \{ \max{ \{ \kappa, \frac{1}{n} \sum_{i=1}^n X_i \} }, 1 - \kappa \} } in order to make sure that our estimator is in the good range? 


2) For the second part, I understand that  X - \mu is  \frac{1}{4} -subgaussian and the subsequent bound on  P( | \hat{\mu(S)} - \mu |) (this is simply from Hoeffding's lemma applied for the Bernoulli random variable + subgaussian bound). Just to make sure, you obtain the bound on  P( | \hat{h}(S) - h |) ? via the mean value theorem for  h_2 ?


Thank you!

Andreea

In reply to Andreea-Alexandra Musat

Re: Problem 2 Final Exam 2020

by Thomas Weinberger -
Hello Andreea,

1) you are of course correct, it should read \hat{\mu}(S):= \min\{\max\{\kappa, \frac{1}{n}\sum X_i\}, 1-\kappa\}
2) I don't think this is due to the mean-value Theorem. I will try to reconstruct the argument soon and will then post it here (until then, don't worry, this will not appear in the mid term...)

Best,
Thomas
In reply to Thomas Weinberger

Re: Problem 2 Final Exam 2020

by Andreea-Alexandra Musat -
Hello Thomas,

Thanks for the reply!

To expand a bit on my reasoning for (2):
 h_2(x) = - x \log{(x)} - (1 - x) \log{(1 - x)} is continuous and differentiable on  (0, 1) and thus for any  a, b \in (0, 1), \exists c \in (a, b) such that  h^\prime (c) = \frac{f(b) - f(a)}{b - a} (*). Because  X_i - \mu is  \frac{1}{4} -subgaussian, we have that  \mathbb{P}(| \hat{\mu}(S) - \mu| \geq \epsilon) \leq \delta (for some  \epsilon that depends on  \delta ) and so if we multiply both sides inside the probability by  | h^\prime(c) | we have that  \mathbb{P}( |h^\prime (c) | | \hat{\mu}(S) - \mu | \geq | h^\prime (c) | \epsilon  ) \leq \delta and the bound on  \mathbb{P}(| h(\hat{\mu}(S)) - h | \geq \epsilon^\prime) follows from (*) with  a = \hat{\mu}(S) and  b = \mu together with the fact that  h^\prime(c) is bounded.
But of course, there might be better/shorter ways of getting to this.

Best,
Andreea
In reply to Andreea-Alexandra Musat

Re: Problem 2 Final Exam 2020

by Thomas Weinberger -
Hi Andreea,

Thanks for further expanding. Your reasoning is absolutely correct!

Best,
Thomas