2.II.29I

Explain what is meant by a time-homogeneous discrete time Markov decision problem.

What is the positive programming case?

A discrete time Markov decision problem has state space $\{0,1, \ldots, N\}$ . In state $i, i \neq 0, N$ , two actions are possible. We may either stop and obtain a terminal reward $r(i) \geqslant 0$ , or may continue, in which case the subsequent state is equally likely to be $i-1$ or $i+1$ . In states 0 and $N$ stopping is automatic (with terminal rewards $r(0)$ and $r(N)$ respectively). Starting in state $i$ , denote by $V_{n}(i)$ and $V(i)$ the maximal expected terminal reward that can be obtained over the first $n$ steps and over the infinite horizon, respectively. Prove that $\lim _{n \rightarrow \infty} V_{n}=V$ .

Prove that $V$ is the smallest concave function such that $V(i) \geqslant r(i)$ for all $i$ .

Describe an optimal policy.

Suppose $r(0), \ldots, r(N)$ are distinct numbers. Show that the optimal policy is unique, or give a counter-example.