A2.12

Computational Statistics and Statistical Modelling
Part II, 2004

(i) Suppose we have independent observations Y1,,YnY_{1}, \ldots, Y_{n}, and we assume that for i=1,,n,Yii=1, \ldots, n, Y_{i} is Poisson with mean μi\mu_{i}, and log(μi)=βTxi\log \left(\mu_{i}\right)=\beta^{T} x_{i}, where x1,,xnx_{1}, \ldots, x_{n} are given covariate vectors each of dimension pp, where β\beta is an unknown vector of dimension pp, and p<np<n. Assuming that {x1,,xn}\left\{x_{1}, \ldots, x_{n}\right\} span Rp\mathbb{R}^{p}, find the equation for β^\hat{\beta}, the maximum likelihood estimator of β\beta, and write down the large-sample distribution of β^\hat{\beta}.

(ii) A long-term agricultural experiment had 90 grassland plots, each 25 m×25 m25 \mathrm{~m} \times 25 \mathrm{~m}, differing in biomass, soil pH, and species richness (the count of species in the whole plot). While it was well-known that species richness declines with increasing biomass, it was not known how this relationship depends on soil pH, which for the given study has possible values "low", "medium" or "high", each taken 30 times. Explain the commands input, and interpret the resulting output in the (slightly edited) RR output below, in which "species" represents the species count.

(The first and last 2 lines of the data are reproduced here as an aid. You may assume that the factor pH has been correctly set up.)