Suppose we have input-output pairs (x1,y1),…,(xn,yn)∈Rp×{−1,1}. Consider the empirical risk minimisation problem with hypothesis class
H={x↦xTβ:β∈C}
where C is a non-empty closed convex subset of Rp, and logistic loss
ℓ(h(x),y)=log2(1+e−yh(x)),
for h∈H and (x,y)∈Rp×{−1,1}.
(i) Show that the objective function f of the optimisation problem is convex.
(ii) Let πC(x) denote the projection of x onto C. Describe the procedure of stochastic gradient descent (SGD) for minimisation of f above, giving explicit forms for any gradients used in the algorithm.
(iii) Suppose β^ minimises f(β) over β∈C. Suppose maxi=1,…,n∥xi∥2⩽M and supβ∈C∥β∥2⩽R. Prove that the output βˉ of k iterations of the SGD algorithm with some fixed step size η (which you should specify), satisfies
Ef(βˉ)−f(β^)⩽log(2)k2MR
(iv) Now suppose that the step size at iteration s is ηs>0 for each s=1,…,k. Show that, writing βs for the s th iterate of SGD, we have
Ef(β~)−f(β^)⩽2A1{log(2)}2A2M2+A12R2
where
β~=A11s=1∑kηsβs,A1=s=1∑kηs and A2=s=1∑kηs2
[You may use standard properties of convex functions and projections onto closed convex sets without proof provided you state them.]