A1.13

Computational Statistics and Statistical Modelling
Part II, 2003

(i) Suppose Yi,1inY_{i}, 1 \leqslant i \leqslant n, are independent binomial observations, with YiBi(ti,πi)Y_{i} \sim B i\left(t_{i}, \pi_{i}\right), 1in1 \leqslant i \leqslant n, where t1,,tnt_{1}, \ldots, t_{n} are known, and we wish to fit the model

ω:logπi1πi=μ+βTxi for each i\omega: \log \frac{\pi_{i}}{1-\pi_{i}}=\mu+\beta^{T} x_{i} \quad \text { for each } i

where x1,,xnx_{1}, \ldots, x_{n} are given covariates, each of dimension pp. Let μ^,β^\hat{\mu}, \hat{\beta} be the maximum likelihood estimators of μ,β\mu, \beta. Derive equations for μ^,β^\hat{\mu}, \hat{\beta} and state without proof the form of the approximate distribution of β^\hat{\beta}.

(ii) In 1975 , data were collected on the 3-year survival status of patients suffering from a type of cancer, yielding the following table

\begin{tabular}{ccrr} & & \multicolumn{2}{c}{ survive? } \ age in years & malignant & yes & no \ under 50 & no & 77 & 10 \ under 50 & yes & 51 & 13 \ 506950-69 & no & 51 & 11 \ 506950-69 & yes & 38 & 20 \ 70+70+ & no & 7 & 3 \ 70+70+ & yes & 6 & 3 \end{tabular}

Here the second column represents whether the initial tumour was not malignant or was malignant.

Let YijY_{i j} be the number surviving, for age group ii and malignancy status jj, for i=1,2,3i=1,2,3 and j=1,2j=1,2, and let tijt_{i j} be the corresponding total number. Thus Y11=77Y_{11}=77, t11=87t_{11}=87. Assume YijBi(tij,πij),1i3,1j2Y_{i j} \sim B i\left(t_{i j}, \pi_{i j}\right), 1 \leqslant i \leqslant 3,1 \leqslant j \leqslant 2. The results from fitting the model

log(πij/(1πij))=μ+αi+βj\log \left(\pi_{i j} /\left(1-\pi_{i j}\right)\right)=\mu+\alpha_{i}+\beta_{j}

with α1=0,β1=0\alpha_{1}=0, \beta_{1}=0 give β^2=0.7328(se=0.2985)\hat{\beta}_{2}=-0.7328(\mathrm{se}=0.2985), and deviance =0.4941=0.4941. What do you conclude?

Why do we take α1=0,β1=0\alpha_{1}=0, \beta_{1}=0 in the model?

What "residuals" should you compute, and to which distribution would you refer them?