Paper 3, Section II, J

Consider an infinite-horizon controlled Markov process having per-period costs $c(x, u) \geqslant 0$ , where $x \in \mathcal{X}$ is the state of the system, and $u \in \mathcal{U}$ is the control. Costs are discounted at rate $\beta \in(0,1]$ , so that the objective to be minimized is

\mathbb{E}\left[\sum_{t \geqslant 0} \beta^{t} c\left(X_{t}, u_{t}\right) \mid X_{0}=x\right]

What is meant by a policy $\pi$ for this problem?

Let $\mathcal{L}$ denote the dynamic programming operator

\mathcal{L} f(x) \equiv \inf _{u \in \mathcal{U}}\left\{c(x, u)+\beta \mathbb{E}\left[f\left(X_{1}\right) \mid X_{0}=x, u_{0}=u\right]\right\}

Further, let $F$ denote the value of the optimal control problem:

F(x)=\inf _{\pi} \mathbb{E}^{\pi}\left[\sum_{t \geqslant 0} \beta^{t} c\left(X_{t}, u_{t}\right) \mid X_{0}=x\right]

where the infimum is taken over all policies $\pi$ , and $\mathbb{E}^{\pi}$ denotes expectation under policy $\pi$ . Show that the functions $F_{t}$ defined by

F_{t+1}=\mathcal{L} F_{t} \quad(t \geqslant 0), \quad F_{0} \equiv 0

increase to a limit $F_{\infty} \in[0, \infty] .$ Prove that $F_{\infty} \leqslant F$ . Prove that $F=\mathcal{L} F$

Suppose that $\Phi=\mathcal{L} \Phi \geqslant 0$ . Prove that $\Phi \geqslant F$ .

[You may assume that there is a function $u_{*}: \mathcal{X} \rightarrow \mathcal{U}$ such that

\mathcal{L} \Phi(x)=c\left(x, u_{*}(x)\right)+\beta \mathbb{E}\left[\Phi\left(X_{1}\right) \mid X_{0}=x, u_{0}=u_{*}(x)\right]

though the result remains true without this simplifying assumption.]