3.II.28I
Let be a discrete-time controllable dynamical system (or Markov decision process) with countable state-space and action-space . Consider the -horizon dynamic optimization problem with instantaneous costs , on choosing action in state at time , with terminal cost , in state at time . Explain what is meant by a Markov control and how the choice of a control gives rise to a time-inhomogeneous Markov chain.
Suppose we can find a bounded function and a Markov control such that
with equality when , and such that for all . Here denotes the expected value of , given that we choose action in state at time . Show that is an optimal Markov control.
A well-shuffled pack of cards is placed face-down on the table. The cards are turned over one by one until none are left. Exactly once you may place a bet of on the event that the next two cards will be red. How should you choose the moment to bet? Justify your answer.