(a) Let Yi=xi⊤β+εi where εi for i=1,…,n are independent and identically distributed. Let Zi=I(Yi<0) for i=1,…,n, and suppose that these variables follow a binary regression model with the complementary log-log link function g(μ)= log(−log(1−μ)). What is the probability density function of ε1 ?
(b) The Newton-Raphson algorithm can be applied to compute the MLE, β^, in certain GLMs. Starting from β(0)=0, we let β(t+1) be the maximizer of the quadratic approximation of the log-likelihood ℓ(β;Y) around β(t) :
ℓ(β;Y)≈ℓ(β(t);Y)+(β−β(t))⊤Dℓ(β(t);Y)+(β−β(t))⊤D2ℓ(β(t);Y)(β−β(t)),
where Dℓ and D2ℓ are the gradient and Hessian of the log-likelihood. What is the difference between this algorithm and Iterative Weighted Least Squares? Why might the latter be preferable?