3.II.28I

Optimization and Control
Part II, 2007

Let PP be a discrete-time controllable dynamical system (or Markov decision process) with countable state-space SS and action-space AA. Consider the nn-horizon dynamic optimization problem with instantaneous costs c(k,x,a)c(k, x, a), on choosing action aa in state xx at time kn1k \leqslant n-1, with terminal cost C(x)C(x), in state xx at time nn. Explain what is meant by a Markov control and how the choice of a control gives rise to a time-inhomogeneous Markov chain.

Suppose we can find a bounded function VV and a Markov control uu^{*} such that

V(k,x)(c+PV)(k,x,a),0kn1,xS,aAV(k, x) \leqslant(c+P V)(k, x, a), \quad 0 \leqslant k \leqslant n-1, \quad x \in S, \quad a \in A

with equality when a=u(k,x)a=u^{*}(k, x), and such that V(n,x)=C(x)V(n, x)=C(x) for all xx. Here PV(k,x,a)P V(k, x, a) denotes the expected value of V(k+1,Xk+1)V\left(k+1, X_{k+1}\right), given that we choose action aa in state xx at time kk. Show that uu^{*} is an optimal Markov control.

A well-shuffled pack of cards is placed face-down on the table. The cards are turned over one by one until none are left. Exactly once you may place a bet of £1000£ 1000 on the event that the next two cards will be red. How should you choose the moment to bet? Justify your answer.