Paper 2, Section II, J

Mathematics of Machine Learning
Part II, 2020

(a) Let F\mathcal{F} be a family of functions f:X{0,1}f: \mathcal{X} \rightarrow\{0,1\}. What does it mean for x1:nXnx_{1: n} \in \mathcal{X}^{n} to be shattered by F\mathcal{F} ? Define the shattering coefficient s(F,n)s(\mathcal{F}, n) and the VCV C dimension VC(F)\operatorname{VC}(\mathcal{F}) of F\mathcal{F}

Let

A={j=1d(,aj]:a1,,adR}\mathcal{A}=\left\{\prod_{j=1}^{d}\left(-\infty, a_{j}\right]: a_{1}, \ldots, a_{d} \in \mathbb{R}\right\}

and set F={1A:AA}\mathcal{F}=\left\{\mathbf{1}_{A}: A \in \mathcal{A}\right\}. Compute VC(F)\operatorname{VC}(\mathcal{F}).

(b) State the Sauer-Shelah lemma.

(c) Let F1,,Fr\mathcal{F}_{1}, \ldots, \mathcal{F}_{r} be families of functions f:X{0,1}f: \mathcal{X} \rightarrow\{0,1\} with finite VC dimension v1v \geqslant 1. Now suppose x1:nx_{1: n} is shattered by k=1rFk\cup_{k=1}^{r} \mathcal{F}_{k}. Show that

2nr(n+1)v.2^{n} \leqslant r(n+1)^{v} .

Conclude that for v3v \geqslant 3,

VC(k=1rFk)4vlog2(2v)+2log2(r)\mathrm{VC}\left(\cup_{k=1}^{r} \mathcal{F}_{k}\right) \leqslant 4 v \log _{2}(2 v)+2 \log _{2}(r)

[You may use without proof the fact that if xα+βlog2(x+1)x \leqslant \alpha+\beta \log _{2}(x+1) with α>0\alpha>0 and β3\beta \geqslant 3, then x4βlog2(2β)+2αx \leqslant 4 \beta \log _{2}(2 \beta)+2 \alpha for x1x \geqslant 1.]

(d) Now let B\mathcal{B} be the collection of subsets of Rp\mathbb{R}^{p} of the form of a product j=1pAj\prod_{j=1}^{p} A_{j} of intervals AjA_{j}, where exactly d{1,,p}d \in\{1, \ldots, p\} of the AjA_{j} are of the form (,aj]\left(-\infty, a_{j}\right] for ajRa_{j} \in \mathbb{R} and the remaining pdp-d intervals are R\mathbb{R}. Set G={1B:BB}\mathcal{G}=\left\{\mathbf{1}_{B}: B \in \mathcal{B}\right\}. Show that when d3d \geqslant 3,

VC(G)2d[2log2(2d)+log2(p)]\mathrm{VC}(\mathcal{G}) \leqslant 2 d\left[2 \log _{2}(2 d)+\log _{2}(p)\right]