Paper 4, Section II, K

Statistical Modelling
Part II, 2016

For 31 days after the outbreak of the 2014 Ebola epidemic, the World Health Organization recorded the number of new cases per day in 60 hospitals in West Africa. Researchers are interested in modelling YijY_{i j}, the number of new Ebola cases in hospital ii on day j2j \geqslant 2, as a function of several covariates:

  • lab: a Boolean factor for whether the hospital has laboratory facilities,

  • casesBefore: number of cases at the hospital on the previous day,

  • urban: a Boolean factor indicating an urban area,

  • country: a factor with three categories, Guinea, Liberia, and Sierra Leone,

  • numDoctors: number of doctors at the hospital,

  • tradBurials: a Boolean factor indicating whether traditional burials are common in the region.

Consider the output of the following RR code (with some lines omitted):

fit. 1 <- glm(newCases lab+casesBefore+urban+country+numDoctors+tradBurials,

  • data=ebola, family=poisson)

>> summary (fit.1)

Coefficients:

Estimate Std. Error z value Pr(>z)\operatorname{Pr}(>|z|)

 labTRUE 0.0947310.0503221.8820.0598 (Intercept) 0.0112980.0494980.2280.8195\begin{array}{lllll}\text { labTRUE } & 0.094731 & 0.050322 & 1.882 & 0.0598 \\ \text { (Intercept) } & 0.011298 & 0.049498 & 0.228 & 0.8195\end{array}

casesBefore 0.3247440.00775241.891<2e16\quad 0.324744 \quad 0.007752 \quad 41.891<2 \mathrm{e}-16 * * *

 urbanTRUE 0.0915540.0882121.0380.2993\begin{array}{llllll}\text { urbanTRUE } & -0.091554 & 0.088212 & -1.038 & 0.2993\end{array}

countryLiberia 0.0884900.0341192.5940.0095\quad 0.088490 \quad 0.034119 \quad 2.594 \quad 0.0095 * *

countrySierra Leone 0.1974740.0369695.3429.21e08-0.197474 \quad 0.036969-5.3429 .21 \mathrm{e}-08 * * *

numDoctors 0.0208190.0046584.4707.83e06\quad-0.020819 \quad 0.004658-4.4707 .83 \mathrm{e}-06 * * *

tradBurialstrUE 0.0542960.0316761.7140.0865.\quad 0.054296 \quad 0.031676 \quad 1.714 \quad 0.0865 .

Signif. codes: 0,0.0011,0.01,,0.05,0.1,10{ }^{\prime} * * *, 0.0011^{\prime} * *, 0.01, *, 0.05, \ldots 0.1,1

(a) Would you conclude based on the zz-tests that an urban setting does not affect the rate of infection?

(b) Explain how you would predict the total number of new cases that the researchers will record in Sierra Leone on day 32 .

We fit a new model which includes an interaction term, and compute a test statistic using the code:

>> fit. 2<2<- glm (newCases \sim casesBefore+country+country:casesBefore+numDoctors,

  • data=ebola, family=poisson)

fit. 2 deviance - fit.1$deviance

[1] 3.0161383.016138

(c) What is the distribution of the statistic computed in the last line?

(d) Under what conditions is the deviance of each model approximately chi-squared?