B3.14

Consider a scalar system with $x_{t+1}=\left(x_{t}+u_{t}\right) \xi_{t}$ , where $\xi_{0}, \xi_{1}, \ldots$ is a sequence of independent random variables, uniform on the interval $[-a, a]$ , with $a \leqslant 1$ . We wish to choose $u_{0}, \ldots, u_{h-1}$ to minimize the expected value of

\sum_{t=0}^{h-1}\left(c+x_{t}^{2}+u_{t}^{2}\right)+3 x_{h}^{2},

where $u_{t}$ is chosen knowing $x_{t}$ but not $\xi_{t}$ . Prove that the minimal expected cost can be written $V_{h}\left(x_{0}\right)=h c+x_{0}^{2} \Pi_{h}$ and derive a recurrence for calculating $\Pi_{1}, \ldots, \Pi_{h}$ .

How does your answer change if $u_{t}$ is constrained to lie in the set $\mathcal{U}\left(x_{t}\right)=\{u:$ $\left.\left|u+x_{t}\right|<\left|x_{t}\right|\right\} ?$

Consider a stopping problem for which there are two options in state $x_{t}, t \geqslant 0$ :

(1) stop: paying a terminal cost $3 x_{t}^{2}$ ; no further costs are incurred;

(2) continue: choosing $u_{t} \in \mathcal{U}\left(x_{t}\right)$ , paying $c+u_{t}^{2}+x_{t}^{2}$ , and moving to state $x_{t+1}=\left(x_{t}+u_{t}\right) \xi_{t} .$

Consider the problem of minimizing total expected cost subject to the constraint that no more than $h$ continuation steps are allowed. Suppose $a=1$ . Show that an optimal policy stops if and only if either $h$ continuation steps have already been taken or $x^{2} \leqslant 2 c / 3$ .

[Hint: Use induction on $h$ to show that a one-step-look-ahead rule is optimal. You should not need to find the optimal $u_{t}$ for the continuation steps.]