Suppose we have data (Y1,x1T),…,(Yn,xnT), where the Yi are independent conditional on the design matrix X whose rows are the xiT,i=1,…,n. Suppose that given xi, the true probability density function of Yi is fxi, so that the data is generated from an element of a model F:={(fxi(⋅;θ))i=1n,θ∈Θ} for some Θ⊆Rq and q∈N.
(a) Define the log-likelihood function for F, the maximum likelihood estimator of θ and Akaike's Information Criterion (AIC) for F.
From now on let F be the normal linear model, i.e. Y:=(Y1,…,Yn)T=Xβ+ε, where X∈Rn×p has full column rank and ε∼Nn(0,σ2I).
(b) Let σ^2 denote the maximum likelihood estimator of σ2. Show that the AIC of F is
n(1+log(2πσ^2))+2(p+1)
(c) Let χn−p2 be a chi-squared distribution on n−p degrees of freedom. Using any results from the course, show that the distribution of the AIC of F is
nlog(χn−p2)+n(log(2πσ2/n)+1)+2(p+1)
[ Hint: σ^2:=n−1∥Y−Xβ^∥2=n−1∥(I−P)ε∥2, where β^ is the maximum likelihood estimator of β and P is the projection matrix onto the column space of X.]