2.II.29I

Optimization and Control
Part II, 2005

Explain what is meant by a time-homogeneous discrete time Markov decision problem.

What is the positive programming case?

A discrete time Markov decision problem has state space {0,1,,N}\{0,1, \ldots, N\}. In state i,i0,Ni, i \neq 0, N, two actions are possible. We may either stop and obtain a terminal reward r(i)0r(i) \geqslant 0, or may continue, in which case the subsequent state is equally likely to be i1i-1 or i+1i+1. In states 0 and NN stopping is automatic (with terminal rewards r(0)r(0) and r(N)r(N) respectively). Starting in state ii, denote by Vn(i)V_{n}(i) and V(i)V(i) the maximal expected terminal reward that can be obtained over the first nn steps and over the infinite horizon, respectively. Prove that limnVn=V\lim _{n \rightarrow \infty} V_{n}=V.

Prove that VV is the smallest concave function such that V(i)r(i)V(i) \geqslant r(i) for all ii.

Describe an optimal policy.

Suppose r(0),,r(N)r(0), \ldots, r(N) are distinct numbers. Show that the optimal policy is unique, or give a counter-example.