Paper 1, Section I, J

Consider a generalised linear model with full column rank design matrix $X \in \mathbb{R}^{n \times p}$ , output variables $Y=\left(Y_{1}, \ldots, Y_{n}\right) \in \mathbb{R}^{n}$ , link function $g$ , mean parameters $\mu=\left(\mu_{1}, \ldots, \mu_{n}\right)$ and known dispersion parameters $\sigma_{i}^{2}=a_{i} \sigma^{2}, i=1, \ldots, n$ . Denote its variance function by $V$ and recall that $g\left(\mu_{i}\right)=x_{i}^{T} \beta, i=1, \ldots, n$ , where $\beta \in \mathbb{R}^{p}$ and $x_{i}^{T}$ is the $i^{\text {th }}$ row of $X$ .

(a) Define the score function in terms of the log-likelihood function and the Fisher information matrix, and define the update of the Fisher scoring algorithm.

(b) Let $W \in \mathbb{R}^{n \times n}$ be a diagonal matrix with positive entries. Note that $X^{T} W X$ is invertible. Show that

\operatorname{argmin}_{b \in \mathbb{R}^{p}}\left\{\sum_{i=1}^{n} W_{i i}\left(Y_{i}-x_{i}^{T} b\right)^{2}\right\}=\left(X^{T} W X\right)^{-1} X^{T} W Y

[Hint: you may use that $\left.\operatorname{argmin}_{b \in \mathbb{R}^{p}}\left\{\left\|Y-X^{T} b\right\|^{2}\right\}=\left(X^{T} X\right)^{-1} X^{T} Y .\right]$

\begin{aligned} &U_{j}(\beta)=\sum_{i=1}^{n} \frac{\left(Y_{i}-\mu_{i}\right) X_{i j}}{a_{i} \sigma^{2} V\left(\mu_{i}\right) g^{\prime}\left(\mu_{i}\right)} \quad j=1, \ldots, p \\ &i_{j k}(\beta)=\sum_{i=1}^{n} \frac{X_{i j} X_{i k}}{a_{i} \sigma^{2} V\left(\mu_{i}\right)\left\{g^{\prime}\left(\mu_{i}\right)\right\}^{2}} \quad j, k=1, \ldots, p \end{aligned}

Justify, performing the necessary calculations and using part (b), why the Fisher scoring algorithm is also known as the iterative reweighted least squares algorithm.