Consider the scalar system with plant equation xt+1=xt+ut,t=0,1,… and cost
Cs(x0,u0,u1,…)=t=0∑s[ut2+34xt2]
Show from first principles that minu0,u1,…Cs=Vsx02, where V0=4/3 and for s=0,1,…
Vs+1=4/3+Vs/(1+Vs)
Show that Vs→2 as s→∞.
Prove that C∞ is minimized by the stationary control, ut=−2xt/3 for all t.
Consider the stationary policy π0 that has ut=−xt for all t. What is the value of C∞ under this policy?
Consider the following algorithm, in which steps 1 and 2 are repeated as many times as desired.
- For a given stationary policy πn, for which ut=knxt for all t, determine the value of C∞ under this policy as Vπnx02 by solving for Vπn in
Vπn=kn2+4/3+(1+kn)2Vπn
- Now find kn+1 as the minimizer of
kn+12+4/3+(1+kn+1)2Vπn
and define πn+1 as the policy for which ut=kn+1xt for all t.
Explain why πn+1 is guaranteed to be a better policy than πn.
Let π0 be the stationary policy with ut=−xt. Determine π1 and verify that it minimizes C∞ to within 0.2% of its optimum.