Paper 4, Section II, J

Statistical Modelling
Part II, 2010

Every day, Barney the darts player comes to our laboratory. We record his facial expression, which can be either "mad", "weird" or "relaxed", as well as how many units of beer he has drunk that day. Each day he tries a hundred times to hit the bull's-eye, and we write down how often he succeeds. The data look like this:

\begin{tabular}{rrrr} \multicolumn{1}{l}{} & & & \ Day & Beer & Expression & BullsEye \ 1 & 3 & Mad & 30 \ 2 & 3 & Mad & 32 \ \vdots & \vdots & \vdots & \vdots \ 60 & 2 & Mad & 37 \ 61 & 4 & Weird & 30 \ \vdots & \vdots & \vdots & \vdots \ 110 & 4 & Weird & 28 \ 111 & 2 & Relaxed & 35 \ \vdots & \vdots & \vdots & \vdots \ 150 & 3 & Relaxed & 31 \end{tabular}

Write down a reasonable model for Y1,,YnY_{1}, \ldots, Y_{n}, where n=150n=150 and where YiY_{i} is the number of times Barney has hit bull's-eye on the ii th day. Explain briefly why we may wish initially to include interactions between the variables. Write the R\mathrm{R} code to fit your model.

The scientist of the above story fitted her own generalized linear model, and subsequently obtained the following summary (abbreviated):

Why are ExpressionMad and Beer:ExpressionMad not listed? Suppose on a particular day, Barney's facial expression is weird, and he drank three units of beer. Give the linear predictor in the scientist's model for this day.

Based on the summary, how could you improve your model? How could one fit this new model in RR (without modifying the data file)?