Paper 4, Section II, H

Statistics
Part IB, 2013

Explain the notion of a sufficient statistic.

Suppose XX is a random variable with distribution FF taking values in {1,,6}\{1, \ldots, 6\}, with P(X=i)=piP(X=i)=p_{i}. Let x1,,xnx_{1}, \ldots, x_{n} be a sample from FF. Suppose nin_{i} is the number of these xjx_{j} that are equal to ii. Use a factorization criterion to explain why (n1,,n6)\left(n_{1}, \ldots, n_{6}\right) is sufficient for θ=(p1,,p6)\theta=\left(p_{1}, \ldots, p_{6}\right).

Let H0H_{0} be the hypothesis that pi=1/6p_{i}=1 / 6 for all ii. Derive the statistic of the generalized likelihood ratio test of H0H_{0} against the alternative that this is not a good fit.

Assuming that nin/6n_{i} \approx n / 6 when H0H_{0} is true and nn is large, show that this test can be approximated by a chi-squared test using a test statistic

T=n+6ni=16ni2T=-n+\frac{6}{n} \sum_{i=1}^{6} n_{i}^{2}

Suppose n=100n=100 and T=8.12T=8.12. Would you reject H0?H_{0} ? Explain your answer.