Paper 4, Section II, K

Statistical Modelling
Part II, 2014

In a study on infant respiratory disease, data are collected on a sample of 2074 infants. The information collected includes whether or not each infant developed a respiratory disease in the first year of their life; the gender of each infant; and details on how they were fed as one of three categories (breast-fed, bottle-fed and supplement). The data are tabulated in R\mathrm{R} as follows:

 disease  nondisease  gender  food 177381 Boy  Bottle-fed 219128 Boy  Supplement 347447 Boy  Breast-fed 448336 Girl  Bottle-fed 516111 Girl  Supplement 631433 Girl  Breast-fed \begin{array}{rrrrr} & \text { disease } & \text { nondisease } & \text { gender } & \text { food } \\ 1 & 77 & 381 & \text { Boy } & \text { Bottle-fed } \\ 2 & 19 & 128 & \text { Boy } & \text { Supplement } \\ 3 & 47 & 447 & \text { Boy } & \text { Breast-fed } \\ 4 & 48 & 336 & \text { Girl } & \text { Bottle-fed } \\ 5 & 16 & 111 & \text { Girl } & \text { Supplement } \\ 6 & 31 & 433 & \text { Girl } & \text { Breast-fed }\end{array}

Write down the model being fit by the RR commands on the following page:

The following (slightly abbreviated) output from RR is obtained.

Briefly explain the justification for the standard errors presented in the output above.

Explain the relevance of the output of the following RR code to the data being studied, justifying your answer:

>exp(c(0.66931.960.153,0.6693+1.960.153))>\exp (c(-0.6693-1.96 * 0.153,-0.6693+1.96 * 0.153))

[1] 0.37939400.69113510.3793940 \quad 0.6911351

[Hint: It may help to recall that if ZN(0,1)Z \sim N(0,1) then P(Z1.96)=0.025.]\mathbb{P}(Z \geqslant 1.96)=0.025 .]

Let D1D_{1} be the deviance of the model fitted by the following R\mathrm{R} command.

>> fit 1<1<- glm (disease/total gender + food + gender:food,

++ family = binomial, weights = total ))

What is the numerical value of D1D_{1} ? Which of the two models that have been fitted should you prefer, and why?