Paper 3, Section I, 5K5 K

Statistical Modelling
Part II, 2014

In an experiment to study factors affecting the production of the plastic polyvinyl chloride (PVC)(\mathrm{PVC}), three experimenters each used eight devices to produce the PVC and measured the sizes of the particles produced. For each of the 24 combinations of device and experimenter, two size measurements were obtained.

The experimenters and devices used for each of the 48 measurements are stored in R\mathrm{R} as factors in the objects experimenter and device respectively, with the measurements themselves stored in the vector psize. The following analysis was performed in R\mathrm{R}.

Let XX and X0X_{0} denote the design matrices obtained by model.matrix(fit) and model.matrix (fit0) respectively, and let YY denote the response psize. Let PP and P0P_{0} denote orthogonal projections onto the column spaces of XX and X0X_{0} respectively.

For each of the following quantities, write down their numerical values if they appear in the analysis of variance table above; otherwise write 'unknown'.

  1. (IP)Y2\|(I-P) Y\|^{2}

  2. X(XTX)1XTY2\left\|X\left(X^{T} X\right)^{-1} X^{T} Y\right\|^{2}

  3. (IP0)Y2(IP)Y2\left\|\left(I-P_{0}\right) Y\right\|^{2}-\|(I-P) Y\|^{2}

  4. (PP0)Y2/14(IP)Y2/24\frac{\left\|\left(P-P_{0}\right) Y\right\|^{2} / 14}{\|(I-P) Y\|^{2} / 24}

  5. i=148Yi/48\sum_{i=1}^{48} Y_{i} / 48

Out of the two models that have been fitted, which appears to be the more appropriate for the data according to the analysis performed, and why?