The numbers of ear infections observed among beach and non-beach (mostly pool) swimmers were recorded, along with explanatory variables: frequency, location, age, and sex. The data are aggregated by group, with a total of 24 groups defined by the explanatory variables.
freq loc age sex count nF= frequent, NF= infrequent NB= non-beach, B= beach 15−19,20−24,24−29F= female, M= male the number of infections reported over a fixed time period the total number of swimmers
The data look like this:
1234[…]2324 count 6814351656 n 3141211156 freq FFFF NF NF loc NB NB NB NB B B sex M F M F M F age 15−1915−1920−2420−2425−2925−29
Let μj denote the expected number of ear infections of a person in group j. Explain why it is reasonable to model count j as Poisson with mean njμj.
We fit the following Poisson model:
log(E(countj))=log(njμj)=log(nj)+xjβ
where log(nj) is an offset, i.e. an explanatory variable with known coefficient 1. R produces the following (abbreviated) summary for the main effects model:
Why are expressions freq F, locB, age 15−19, and sexF not listed?
Suppose that we plan to observe a group of 20 female, non-frequent, beach swimmers, aged 20-24. Give an expression (using the coefficient estimates from the model fitted above) for the expected number of ear infections in this group.
Now, suppose that we allow for interaction between variables age and sex. Give the R command for fitting this model. We test for the effect of this interaction by producing the following (abbreviated) ANOVA table:
Briefly explain what test is performed, and what you would conclude from it. Does either of these models fit the data well?