3.II.28I

Let $P$ be a discrete-time controllable dynamical system (or Markov decision process) with countable state-space $S$ and action-space $A$ . Consider the $n$ -horizon dynamic optimization problem with instantaneous costs $c(k, x, a)$ , on choosing action $a$ in state $x$ at time $k \leqslant n-1$ , with terminal cost $C(x)$ , in state $x$ at time $n$ . Explain what is meant by a Markov control and how the choice of a control gives rise to a time-inhomogeneous Markov chain.

Suppose we can find a bounded function $V$ and a Markov control $u^{*}$ such that

V(k, x) \leqslant(c+P V)(k, x, a), \quad 0 \leqslant k \leqslant n-1, \quad x \in S, \quad a \in A

with equality when $a=u^{*}(k, x)$ , and such that $V(n, x)=C(x)$ for all $x$ . Here $P V(k, x, a)$ denotes the expected value of $V\left(k+1, X_{k+1}\right)$ , given that we choose action $a$ in state $x$ at time $k$ . Show that $u^{*}$ is an optimal Markov control.

A well-shuffled pack of cards is placed face-down on the table. The cards are turned over one by one until none are left. Exactly once you may place a bet of $£ 1000$ on the event that the next two cards will be red. How should you choose the moment to bet? Justify your answer.