Paper 2, Section II, J

Mathematics of Machine Learning
Part II, 2021

(a) What is meant by the subdifferential f(x)\partial f(x) of a convex function f:RdRf: \mathbb{R}^{d} \rightarrow \mathbb{R} at xRdx \in \mathbb{R}^{d} ? Write down the subdifferential f(x)\partial f(x) of the function f:RRf: \mathbb{R} \rightarrow \mathbb{R} given by f(x)=γxf(x)=\gamma|x|, where γ>0\gamma>0.

Show that xx minimises ff if and only if 0f(x)0 \in \partial f(x).

What does it mean for a function f:RdRf: \mathbb{R}^{d} \rightarrow \mathbb{R} to be strictly convex? Show that any minimiser of a strictly convex function must be unique.

(b) Suppose we have input-output pairs (x1,y1),,(xn,yn){1,1}p×{1,1}\left(x_{1}, y_{1}\right), \ldots,\left(x_{n}, y_{n}\right) \in\{-1,1\}^{p} \times\{-1,1\} with p2p \geqslant 2. Consider the objective function

f(β)=1ni=1nexp(yixiTβ)+γβ1f(\beta)=\frac{1}{n} \sum_{i=1}^{n} \exp \left(-y_{i} x_{i}^{T} \beta\right)+\gamma\|\beta\|_{1}

where β=(β1,,βp)T\beta=\left(\beta_{1}, \ldots, \beta_{p}\right)^{T} and γ>0\gamma>0. Assume that (yi)i=1n(xi1)i=1n\left(y_{i}\right)_{i=1}^{n} \neq\left(x_{i 1}\right)_{i=1}^{n}. Fix β2,,βp\beta_{2}, \ldots, \beta_{p} and define

κ1=1in:xi1yiexp(yiηi) and κ2=i=1nexp(yiηi)\kappa_{1}=\sum_{\substack{1 \leqslant i \leqslant n: \\ x_{i 1} \neq y_{i}}} \exp \left(-y_{i} \eta_{i}\right) \quad \text { and } \quad \kappa_{2}=\sum_{i=1}^{n} \exp \left(-y_{i} \eta_{i}\right)

where ηi=j=2pxijβj\eta_{i}=\sum_{j=2}^{p} x_{i j} \beta_{j} for i=1,,ni=1, \ldots, n. Show that if 2κ1κ2γ\left|2 \kappa_{1}-\kappa_{2}\right| \leqslant \gamma, then

argminβ1Rf(β1,β2,,βp)=0.\operatorname{argmin}_{\beta_{1} \in \mathbb{R}} f\left(\beta_{1}, \beta_{2}, \ldots, \beta_{p}\right)=0 .

[You may use any results from the course without proof, other than those whose proof is asked for directly.]