Paper 2, Section II, J

Applied Probability
Part II, 2014

(i) Explain what the Moran model and the infinite alleles model are. State Ewens' sampling formula for the distribution of the allelic frequency spectrum (a1,,an)\left(a_{1}, \ldots, a_{n}\right) in terms of θ\theta where θ=Nu\theta=N u with uu denoting the mutation rate per individual and NN the population size.

Let KnK_{n} be the number of allelic types in a sample of size nn. Give, without justification, an expression for E(Kn)\mathbb{E}\left(K_{n}\right) in terms of θ\theta.

(ii) Let KnK_{n} and θ\theta be as above. Show that for 1kn1 \leqslant k \leqslant n we have that

P(Kn=k)=Cθkθ(θ+1)(θ+n1)P\left(K_{n}=k\right)=C \frac{\theta^{k}}{\theta(\theta+1) \cdots(\theta+n-1)}

for some constant CC that does not depend on θ\theta.

Show that, given {Kn=k}\left\{K_{n}=k\right\}, the distribution of the allelic frequency spectrum (a1,,an)\left(a_{1}, \ldots, a_{n}\right) does not depend on θ\theta.

Show that the value of θ\theta which maximises P(Kn=k)\mathbb{P}\left(K_{n}=k\right) is the one for which k=E(Kn)k=\mathbb{E}\left(K_{n}\right).