Paper 2, Section II, K

Consider a Markov decision problem with finite state space $X$ , value function $F$ and dynamic programming equation $F=\mathcal{L} F$ , where

(\mathcal{L} \phi)(i)=\min _{a \in\{0,1\}}\left\{c(i, a)+\beta \sum_{j \in X} P_{i j}(a) \phi(j)\right\} .

Suppose $0<\beta<1$ , and $|c(i, a)| \leqslant B$ for all $i \in X, a \in\{0,1\}$ . Prove there exists a deterministic stationary Markov policy that is optimal, explaining what the italicised words mean.

Let $F_{n}=\mathcal{L}^{n} F_{0}$ , where $F_{0}=0$ , and $M_{n}=\max _{i \in X}\left|F(i)-F_{n}(i)\right|$ . Prove that

M_{n} \leqslant \beta M_{n-1} \leqslant \beta^{n} B /(1-\beta) .

Deduce that the value iteration algorithm converges to an optimal policy in a finite number of iterations.