Paper 3, Section II, J

Optimization and Control
Part II, 2010

Consider an infinite-horizon controlled Markov process having per-period costs c(x,u)0c(x, u) \geqslant 0, where xXx \in \mathcal{X} is the state of the system, and uUu \in \mathcal{U} is the control. Costs are discounted at rate β(0,1]\beta \in(0,1], so that the objective to be minimized is

E[t0βtc(Xt,ut)X0=x]\mathbb{E}\left[\sum_{t \geqslant 0} \beta^{t} c\left(X_{t}, u_{t}\right) \mid X_{0}=x\right]

What is meant by a policy π\pi for this problem?

Let L\mathcal{L} denote the dynamic programming operator

Lf(x)infuU{c(x,u)+βE[f(X1)X0=x,u0=u]}\mathcal{L} f(x) \equiv \inf _{u \in \mathcal{U}}\left\{c(x, u)+\beta \mathbb{E}\left[f\left(X_{1}\right) \mid X_{0}=x, u_{0}=u\right]\right\}

Further, let FF denote the value of the optimal control problem:

F(x)=infπEπ[t0βtc(Xt,ut)X0=x]F(x)=\inf _{\pi} \mathbb{E}^{\pi}\left[\sum_{t \geqslant 0} \beta^{t} c\left(X_{t}, u_{t}\right) \mid X_{0}=x\right]

where the infimum is taken over all policies π\pi, and Eπ\mathbb{E}^{\pi} denotes expectation under policy π\pi. Show that the functions FtF_{t} defined by

Ft+1=LFt(t0),F00F_{t+1}=\mathcal{L} F_{t} \quad(t \geqslant 0), \quad F_{0} \equiv 0

increase to a limit F[0,].F_{\infty} \in[0, \infty] . Prove that FFF_{\infty} \leqslant F. Prove that F=LFF=\mathcal{L} F

Suppose that Φ=LΦ0\Phi=\mathcal{L} \Phi \geqslant 0. Prove that ΦF\Phi \geqslant F.

[You may assume that there is a function u:XUu_{*}: \mathcal{X} \rightarrow \mathcal{U} such that

LΦ(x)=c(x,u(x))+βE[Φ(X1)X0=x,u0=u(x)]\mathcal{L} \Phi(x)=c\left(x, u_{*}(x)\right)+\beta \mathbb{E}\left[\Phi\left(X_{1}\right) \mid X_{0}=x, u_{0}=u_{*}(x)\right]

though the result remains true without this simplifying assumption.]