Paper 4, Section I, J

The numbers of ear infections observed among beach and non-beach (mostly pool) swimmers were recorded, along with explanatory variables: frequency, location, age, and sex. The data are aggregated by group, with a total of 24 groups defined by the explanatory variables.

\begin{array}{ll} \text { freq } & \mathrm{F}=\text { frequent, } \mathrm{NF}=\text { infrequent } \\ \text { loc } & \mathrm{NB}=\text { non-beach, } \mathrm{B}=\text { beach } \\ \text { age } & 15-19,20-24,24-29 \\ \text { sex } & \mathrm{F}=\text { female, } \mathrm{M}=\text { male } \\ \text { count } & \text { the number of infections reported over a fixed time period } \\ \mathrm{n} & \text { the total number of swimmers } \end{array}

The data look like this:

\begin{array}{lrrrrrr} & \text { count } & \text { n } & \text { freq } & \text { loc } & \text { sex } & \text { age } \\ 1 & 68 & 31 & F & \text { NB } & \text { M } & 15-19 \\ 2 & 14 & 4 & F & \text { NB } & \text { F } & 15-19 \\ 3 & 35 & 12 & F & \text { NB } & \text { M } & 20-24 \\ 4 & 16 & 11 & F & \text { NB } & \text { F } & 20-24 \\ {[\ldots]} & & & & & & \\ 23 & 5 & 15 & \text { NF } & \text { B } & \text { M } & 25-29 \\ 24 & 6 & 6 & \text { NF } & \text { B } & \text { F } & 25-29 \end{array}

Let $\mu_{j}$ denote the expected number of ear infections of a person in group $j$ . Explain why it is reasonable to model count ${ }_{j}$ as Poisson with mean $n_{j} \mu_{j}$ .

We fit the following Poisson model:

\log \left(\mathbb{E}\left(\operatorname{count}_{j}\right)\right)=\log \left(n_{j} \mu_{j}\right)=\log \left(n_{j}\right)+\mathbf{x}_{j} \beta

where $\log \left(n_{j}\right)$ is an offset, i.e. an explanatory variable with known coefficient $1 .$ $\mathrm{R}$ produces the following (abbreviated) summary for the main effects model:

Why are expressions freq $\mathrm{F}$ , locB, age $15-19$ , and sexF not listed?

Suppose that we plan to observe a group of 20 female, non-frequent, beach swimmers, aged 20-24. Give an expression (using the coefficient estimates from the model fitted above) for the expected number of ear infections in this group.

Now, suppose that we allow for interaction between variables age and sex. Give the $\mathrm{R}$ command for fitting this model. We test for the effect of this interaction by producing the following (abbreviated) ANOVA table:

Briefly explain what test is performed, and what you would conclude from it. Does either of these models fit the data well?