Paper 4, Section II, J

Mathematics of Machine Learning
Part II, 2021

Let D=(xi,yi)i=1nD=\left(x_{i}, y_{i}\right)_{i=1}^{n} be a dataset of nn input-output pairs lying in Rp×[M,M]\mathbb{R}^{p} \times[-M, M] for MRM \in \mathbb{R}. Describe the random-forest algorithm as applied to DD using decision trees (T^(b))b=1B\left(\hat{T}^{(b)}\right)_{b=1}^{B} to produce a fitted regression function frff_{\mathrm{rf}}. [You need not explain in detail the construction of decision trees, but should describe any modifications specific to the random-forest algorithm.]

Briefly explain why for each xRpx \in \mathbb{R}^{p} and b=1,,Bb=1, \ldots, B, we have T^(b)(x)[M,M]\hat{T}^{(b)}(x) \in[-M, M].

State the bounded-differences inequality.

Treating DD as deterministic, show that with probability at least 1δ1-\delta,

supxRpfrf(x)μ(x)M2log(1/δ)B+E(supxRpfrf(x)μ(x)),\sup _{x \in \mathbb{R}^{p}}\left|f_{\mathrm{rf}}(x)-\mu(x)\right| \leqslant M \sqrt{\frac{2 \log (1 / \delta)}{B}}+\mathbb{E}\left(\sup _{x \in \mathbb{R}^{p}}\left|f_{\mathrm{rf}}(x)-\mu(x)\right|\right),

where μ(x):=Efrf(x)\mu(x):=\mathbb{E} f_{\mathrm{rf}}(x).

[\left[\right. Hint: Treat each T^(b)\hat{T}^{(b)} as a random variable taking values in an appropriate space Z\mathcal{Z} (of functions), and consider a function GG satisfying

G(T^(1),,T^(B))=supxRpfrf(x)μ(x)G\left(\hat{T}^{(1)}, \ldots, \hat{T}^{(B)}\right)=\sup _{x \in \mathbb{R}^{p}}\left|f_{\mathrm{rf}}(x)-\mu(x)\right|