Paper 1, Section I, J

Statistical Modelling
Part II, 2020

Consider a generalised linear model with full column rank design matrix XRn×pX \in \mathbb{R}^{n \times p}, output variables Y=(Y1,,Yn)RnY=\left(Y_{1}, \ldots, Y_{n}\right) \in \mathbb{R}^{n}, link function gg, mean parameters μ=(μ1,,μn)\mu=\left(\mu_{1}, \ldots, \mu_{n}\right) and known dispersion parameters σi2=aiσ2,i=1,,n\sigma_{i}^{2}=a_{i} \sigma^{2}, i=1, \ldots, n. Denote its variance function by VV and recall that g(μi)=xiTβ,i=1,,ng\left(\mu_{i}\right)=x_{i}^{T} \beta, i=1, \ldots, n, where βRp\beta \in \mathbb{R}^{p} and xiTx_{i}^{T} is the ith i^{\text {th }}row of XX.

(a) Define the score function in terms of the log-likelihood function and the Fisher information matrix, and define the update of the Fisher scoring algorithm.

(b) Let WRn×nW \in \mathbb{R}^{n \times n} be a diagonal matrix with positive entries. Note that XTWXX^{T} W X is invertible. Show that

argminbRp{i=1nWii(YixiTb)2}=(XTWX)1XTWY\operatorname{argmin}_{b \in \mathbb{R}^{p}}\left\{\sum_{i=1}^{n} W_{i i}\left(Y_{i}-x_{i}^{T} b\right)^{2}\right\}=\left(X^{T} W X\right)^{-1} X^{T} W Y

[Hint: you may use that argminbRp{YXTb2}=(XTX)1XTY.]\left.\operatorname{argmin}_{b \in \mathbb{R}^{p}}\left\{\left\|Y-X^{T} b\right\|^{2}\right\}=\left(X^{T} X\right)^{-1} X^{T} Y .\right]

(c) Recall that the score function and the Fisher information matrix have entries

Uj(β)=i=1n(Yiμi)Xijaiσ2V(μi)g(μi)j=1,,pijk(β)=i=1nXijXikaiσ2V(μi){g(μi)}2j,k=1,,p\begin{aligned} &U_{j}(\beta)=\sum_{i=1}^{n} \frac{\left(Y_{i}-\mu_{i}\right) X_{i j}}{a_{i} \sigma^{2} V\left(\mu_{i}\right) g^{\prime}\left(\mu_{i}\right)} \quad j=1, \ldots, p \\ &i_{j k}(\beta)=\sum_{i=1}^{n} \frac{X_{i j} X_{i k}}{a_{i} \sigma^{2} V\left(\mu_{i}\right)\left\{g^{\prime}\left(\mu_{i}\right)\right\}^{2}} \quad j, k=1, \ldots, p \end{aligned}

Justify, performing the necessary calculations and using part (b), why the Fisher scoring algorithm is also known as the iterative reweighted least squares algorithm.