Two scalar systems have dynamics
xt+1=xt+ut+ϵt,yt+1=yt+wt+ηt,
where {ϵt} and {ηt} are independent sequences of independent and identically distributed random variables of mean 0 and variance 1 . Let
F(x)=πinfE[t=0∑∞(xt2+ut2)(2/3)t∣x0=x]
where π is a policy in which ut depends on only x0,…,xt.
Show that G(x)=Px2+d is a solution to the optimality equation satisfied by F(x), for some P and d which you should find.
Find the optimal controls.
State a theorem that justifies F(x)=G(x).
For each of the two cases (a) λ=0 and (b) λ=1, find controls {ut,wt} which minimize
E[t=0∑∞(xt2+2λxtyt+yt2+ut2+wt2)(2/3+λ/12)t∣x0=x,y0=y]