An observable scalar state variable evolves as xt+1=xt+ut,t=0,1,… Let controls u0,u1,… be determined by a policy π and define
Cs(π,x0)=t=0∑s−1(xt2+2xtut+7ut2) and Cs(x0)=πinfCs(π,x0)
Show that it is possible to express Cs(x0) in terms of Πs, which satisfies the recurrence
Πs=7+Πs−16(1+Πs−1),s=1,2,…
with Π0=0.
Deduce that C∞(x0)⩾2x02.[C∞(x0) is defined as lims→∞Cs(x0).]
By considering the policy π∗ which takes ut=−(1/3)(2/3)tx0,t=0,1,…, show that C∞(x0)=2x02.
Give an alternative description of π∗ in closed-loop form.