Paper 4, Section II, 25K25 K

Optimization and Control
Part II, 2015

Consider the scalar system evolving as

xt=xt1+ut1+ϵt,t=1,2,,x_{t}=x_{t-1}+u_{t-1}+\epsilon_{t}, \quad t=1,2, \ldots,

where {ϵt}t=1\left\{\epsilon_{t}\right\}_{t=1}^{\infty} is a white noise sequence with Eϵt=0E \epsilon_{t}=0 and Eϵt2=vE \epsilon_{t}^{2}=v. It is desired to choose controls {ut}t=0h1\left\{u_{t}\right\}_{t=0}^{h-1} to minimize E[t=0h1(12xt2+ut2)+xh2]E\left[\sum_{t=0}^{h-1}\left(\frac{1}{2} x_{t}^{2}+u_{t}^{2}\right)+x_{h}^{2}\right]. Show that for h=6h=6 the minimal cost is x02+6vx_{0}^{2}+6 v.

Find a constant λ\lambda and a function ϕ\phi which solve

ϕ(x)+λ=minu[12x2+u2+Eϕ(x+u+ϵ1)]\phi(x)+\lambda=\min _{u}\left[\frac{1}{2} x^{2}+u^{2}+E \phi\left(x+u+\epsilon_{1}\right)\right]

Let PP be the class of those policies for which every utu_{t} obeys the constraint (xt+ut)2(0.9)xt2\left(x_{t}+u_{t}\right)^{2} \leqslant(0.9) x_{t}^{2}. Show that Eπϕ(xt)x02+10vE_{\pi} \phi\left(x_{t}\right) \leqslant x_{0}^{2}+10 v, for all πP\pi \in P. Find, and prove optimal, a policy which over all πP\pi \in P minimizes

limh1hEπ[t=0h1(12xt2+ut2)]\lim _{h \rightarrow \infty} \frac{1}{h} E_{\pi}\left[\sum_{t=0}^{h-1}\left(\frac{1}{2} x_{t}^{2}+u_{t}^{2}\right)\right]