Paper 2, Section II, K

Optimization and Control
Part II, 2016

Consider a Markov decision problem with finite state space XX, value function FF and dynamic programming equation F=LFF=\mathcal{L} F, where

(Lϕ)(i)=mina{0,1}{c(i,a)+βjXPij(a)ϕ(j)}.(\mathcal{L} \phi)(i)=\min _{a \in\{0,1\}}\left\{c(i, a)+\beta \sum_{j \in X} P_{i j}(a) \phi(j)\right\} .

Suppose 0<β<10<\beta<1, and c(i,a)B|c(i, a)| \leqslant B for all iX,a{0,1}i \in X, a \in\{0,1\}. Prove there exists a deterministic stationary Markov policy that is optimal, explaining what the italicised words mean.

Let Fn=LnF0F_{n}=\mathcal{L}^{n} F_{0}, where F0=0F_{0}=0, and Mn=maxiXF(i)Fn(i)M_{n}=\max _{i \in X}\left|F(i)-F_{n}(i)\right|. Prove that

MnβMn1βnB/(1β).M_{n} \leqslant \beta M_{n-1} \leqslant \beta^{n} B /(1-\beta) .

Deduce that the value iteration algorithm converges to an optimal policy in a finite number of iterations.