Paper 1, Section II, J

Statistical Modelling
Part II, 2018

A clinical study follows a number of patients with an illness. Let Yi[0,)Y_{i} \in[0, \infty) be the length of time that patient ii lives and xiRpx_{i} \in \mathbb{R}^{p} a vector of predictors, for i{1,,n}i \in\{1, \ldots, n\}. We shall assume that Y1,,YnY_{1}, \ldots, Y_{n} are independent. Let fif_{i} and FiF_{i} be the probability density function and cumulative distribution function, respectively, of YiY_{i}. The hazard function hih_{i} is defined as

hi(t)=fi(t)1Fi(t) for t0.h_{i}(t)=\frac{f_{i}(t)}{1-F_{i}(t)} \quad \text { for } t \geqslant 0 .

We shall assume that hi(t)=λ(t)exp(βxi)h_{i}(t)=\lambda(t) \exp \left(\beta^{\top} x_{i}\right), where βRp\beta \in \mathbb{R}^{p} is a vector of coefficients and λ(t)\lambda(t) is some fixed hazard function.

(a) Prove that Fi(t)=1exp(0thi(s)ds)F_{i}(t)=1-\exp \left(-\int_{0}^{t} h_{i}(s) d s\right).

(b) Using the equation in part (a), write the log-likelihood function for β\beta in terms of λ,β,xi\lambda, \beta, x_{i} and YiY_{i} only.

(c) Show that the maximum likelihood estimate of β\beta can be obtained through a surrogate Poisson generalised linear model with an offset.