Paper 3, Section II, I

Optimization and Control
Part II, 2009

Two scalar systems have dynamics

xt+1=xt+ut+ϵt,yt+1=yt+wt+ηt,x_{t+1}=x_{t}+u_{t}+\epsilon_{t}, \quad y_{t+1}=y_{t}+w_{t}+\eta_{t},

where {ϵt}\left\{\epsilon_{t}\right\} and {ηt}\left\{\eta_{t}\right\} are independent sequences of independent and identically distributed random variables of mean 0 and variance 1 . Let

F(x)=infπE[t=0(xt2+ut2)(2/3)tx0=x]F(x)=\inf _{\pi} \mathbb{E}\left[\sum_{t=0}^{\infty}\left(x_{t}^{2}+u_{t}^{2}\right)(2 / 3)^{t} \mid x_{0}=x\right]

where π\pi is a policy in which utu_{t} depends on only x0,,xtx_{0}, \ldots, x_{t}.

Show that G(x)=Px2+dG(x)=P x^{2}+d is a solution to the optimality equation satisfied by F(x)F(x), for some PP and dd which you should find.

Find the optimal controls.

State a theorem that justifies F(x)=G(x)F(x)=G(x).

For each of the two cases (a) λ=0\lambda=0 and (b) λ=1\lambda=1, find controls {ut,wt}\left\{u_{t}, w_{t}\right\} which minimize

E[t=0(xt2+2λxtyt+yt2+ut2+wt2)(2/3+λ/12)tx0=x,y0=y]\mathbb{E}\left[\sum_{t=0}^{\infty}\left(x_{t}^{2}+2 \lambda x_{t} y_{t}+y_{t}^{2}+u_{t}^{2}+w_{t}^{2}\right)(2 / 3+\lambda / 12)^{t} \mid x_{0}=x, y_{0}=y\right]