2.II.29I
Explain what is meant by a time-homogeneous discrete time Markov decision problem.
What is the positive programming case?
A discrete time Markov decision problem has state space . In state , two actions are possible. We may either stop and obtain a terminal reward , or may continue, in which case the subsequent state is equally likely to be or . In states 0 and stopping is automatic (with terminal rewards and respectively). Starting in state , denote by and the maximal expected terminal reward that can be obtained over the first steps and over the infinite horizon, respectively. Prove that .
Prove that is the smallest concave function such that for all .
Describe an optimal policy.
Suppose are distinct numbers. Show that the optimal policy is unique, or give a counter-example.