Paper 4, Section II, K
For 31 days after the outbreak of the 2014 Ebola epidemic, the World Health Organization recorded the number of new cases per day in 60 hospitals in West Africa. Researchers are interested in modelling , the number of new Ebola cases in hospital on day , as a function of several covariates:
lab: a Boolean factor for whether the hospital has laboratory facilities,
casesBefore: number of cases at the hospital on the previous day,
urban: a Boolean factor indicating an urban area,
country: a factor with three categories, Guinea, Liberia, and Sierra Leone,
numDoctors: number of doctors at the hospital,
tradBurials: a Boolean factor indicating whether traditional burials are common in the region.
Consider the output of the following code (with some lines omitted):
fit. 1 <- glm(newCases lab+casesBefore+urban+country+numDoctors+tradBurials,
- data=ebola, family=poisson)
summary (fit.1)
Coefficients:
Estimate Std. Error z value
casesBefore
countryLiberia
countrySierra Leone
numDoctors
tradBurialstrUE
Signif. codes:
(a) Would you conclude based on the -tests that an urban setting does not affect the rate of infection?
(b) Explain how you would predict the total number of new cases that the researchers will record in Sierra Leone on day 32 .
We fit a new model which includes an interaction term, and compute a test statistic using the code:
fit. glm (newCases casesBefore+country+country:casesBefore+numDoctors,
- data=ebola, family=poisson)
fit. 2 deviance - fit.1$deviance
[1]
(c) What is the distribution of the statistic computed in the last line?
(d) Under what conditions is the deviance of each model approximately chi-squared?