Paper 4, Section II, K

Principles of Statistics
Part II, 2011

What does it mean to say that a (1×p)(1 \times p) random vector ξ\xi has a multivariate normal distribution?

Suppose ξ=(X,Y)\xi=(X, Y) has the bivariate normal distribution with mean vector μ=(μX,μY)\mu=\left(\mu_{X}, \mu_{Y}\right), and dispersion matrix

Σ=(σXXσXYσXYσYY)\Sigma=\left(\begin{array}{cc} \sigma_{X X} & \sigma_{X Y} \\ \sigma_{X Y} & \sigma_{Y Y} \end{array}\right)

Show that, with β:=σXY/σXX,YβX\beta:=\sigma_{X Y} / \sigma_{X X}, Y-\beta X is independent of XX, and thus that the conditional distribution of YY given XX is normal with mean μY+β(XμX)\mu_{Y}+\beta\left(X-\mu_{X}\right) and variance σYYX:=σYYσXY2/σXX\sigma_{Y Y \cdot X}:=\sigma_{Y Y}-\sigma_{X Y}^{2} / \sigma_{X X}.

For i=1,,n,ξi=(Xi,Yi)i=1, \ldots, n, \xi_{i}=\left(X_{i}, Y_{i}\right) are independent and identically distributed with the above distribution, where all elements of μ\mu and Σ\Sigma are unknown. Let

S=(SXXSXYSXYSYY):=i=1n(ξiξˉ)T(ξiξˉ)S=\left(\begin{array}{cc} S_{X X} & S_{X Y} \\ S_{X Y} & S_{Y Y} \end{array}\right):=\sum_{i=1}^{n}\left(\xi_{i}-\bar{\xi}\right)^{\mathrm{T}}\left(\xi_{i}-\bar{\xi}\right)

where ξˉ:=n1i=1nξi\bar{\xi}:=n^{-1} \sum_{i=1}^{n} \xi_{i}.

The sample correlation coefficient is r:=SXY/SXXSYYr:=S_{X Y} / \sqrt{S_{X X} S_{Y Y}}. Show that the distribution of rr depends only on the population correlation coefficient ρ:=σXY/σXXσYY\rho:=\sigma_{X Y} / \sqrt{\sigma_{X X} \sigma_{Y Y}}.

Student's tt-statistic (on n2n-2 degrees of freedom) for testing the null hypothesis H0:β=0H_{0}: \beta=0 is

t:=β^SYYX/(n2)SXX,t:=\frac{\widehat{\beta}}{\sqrt{S_{Y Y \cdot X} /(n-2) S_{X X}}},

where β^:=SXY/SXX\widehat{\beta}:=S_{X Y} / S_{X X} and SYYX:=SYYSXY2/SXXS_{Y Y \cdot X}:=S_{Y Y}-S_{X Y}^{2} / S_{X X}. Its density when H0H_{0} is true is

p(t)=C(1+t2n2)12(n1)p(t)=C\left(1+\frac{t^{2}}{n-2}\right)^{-\frac{1}{2}(n-1)}

where CC is a constant that need not be specified.

Express tt in terms of rr, and hence derive the density of rr when ρ=0\rho=0.

How could you use the sample correlation rr to test the hypothesis ρ=0\rho=0 ?