Paper 4, Section I, K

Statistical Modelling
Part II, 2016

(a) Let Yi=xiβ+εiY_{i}=x_{i}^{\top} \beta+\varepsilon_{i} where εi\varepsilon_{i} for i=1,,ni=1, \ldots, n are independent and identically distributed. Let Zi=I(Yi<0)Z_{i}=I\left(Y_{i}<0\right) for i=1,,ni=1, \ldots, n, and suppose that these variables follow a binary regression model with the complementary log-log link function g(μ)=g(\mu)= log(log(1μ))\log (-\log (1-\mu)). What is the probability density function of ε1\varepsilon_{1} ?

(b) The Newton-Raphson algorithm can be applied to compute the MLE, β^\hat{\beta}, in certain GLMs. Starting from β(0)=0\beta^{(0)}=0, we let β(t+1)\beta^{(t+1)} be the maximizer of the quadratic approximation of the log-likelihood (β;Y)\ell(\beta ; Y) around β(t)\beta^{(t)} :

(β;Y)(β(t);Y)+(ββ(t))D(β(t);Y)+(ββ(t))D2(β(t);Y)(ββ(t)),\ell(\beta ; Y) \approx \ell\left(\beta^{(t)} ; Y\right)+\left(\beta-\beta^{(t)}\right)^{\top} D \ell\left(\beta^{(t)} ; Y\right)+\left(\beta-\beta^{(t)}\right)^{\top} D^{2} \ell\left(\beta^{(t)} ; Y\right)\left(\beta-\beta^{(t)}\right),

where DD \ell and D2D^{2} \ell are the gradient and Hessian of the log-likelihood. What is the difference between this algorithm and Iterative Weighted Least Squares? Why might the latter be preferable?