(a) Let Z1,…,Zn be i.i.d. random elements taking values in a set Z and let F be a class of functions f:Z→R. Define the Rademacher complexity Rn(F). Write down an inequality relating the Rademacher complexity and
E(f∈Fsupn1i=1∑n(f(Zi)−Ef(Zi)))
State the bounded differences inequality.
(b) Now given i.i.d. input-output pairs (X1,Y1),…,(Xn,Yn)∈X×{−1,1} consider performing empirical risk minimisation with misclassification loss over the class H of classifiers h:X→{−1,1}. Denote by h^ the empirical risk minimiser [which you may assume exists]. For any classifier h, let R(h) be its misclassification risk and suppose this is minimised over H by h∗∈H. Prove that with probability at least 1−δ,
R(h^)−R(h∗)⩽2Rn(F)+n2log(2/δ)
for δ∈(0,1], where F is a class of functions f:X×{−1,1}→{0,1} related to H that you should specify.
(c) Let Zi=(Xi,Yi) for i=1,…,n. Define the empirical Rademacher complexity R^(F(Z1:n)). Show that with probability at least 1−δ,