Consider the scalar system evolving as
xt=xt−1+ut−1+ϵt,t=1,2,…,
where {ϵt}t=1∞ is a white noise sequence with Eϵt=0 and Eϵt2=v. It is desired to choose controls {ut}t=0h−1 to minimize E[∑t=0h−1(21xt2+ut2)+xh2]. Show that for h=6 the minimal cost is x02+6v.
Find a constant λ and a function ϕ which solve
ϕ(x)+λ=umin[21x2+u2+Eϕ(x+u+ϵ1)]
Let P be the class of those policies for which every ut obeys the constraint (xt+ut)2⩽(0.9)xt2. Show that Eπϕ(xt)⩽x02+10v, for all π∈P. Find, and prove optimal, a policy which over all π∈P minimizes
h→∞limh1Eπ[t=0∑h−1(21xt2+ut2)]