B3.14

Optimization and Control
Part II, 2002

Consider a scalar system with xt+1=(xt+ut)ξtx_{t+1}=\left(x_{t}+u_{t}\right) \xi_{t}, where ξ0,ξ1,\xi_{0}, \xi_{1}, \ldots is a sequence of independent random variables, uniform on the interval [a,a][-a, a], with a1a \leqslant 1. We wish to choose u0,,uh1u_{0}, \ldots, u_{h-1} to minimize the expected value of

t=0h1(c+xt2+ut2)+3xh2,\sum_{t=0}^{h-1}\left(c+x_{t}^{2}+u_{t}^{2}\right)+3 x_{h}^{2},

where utu_{t} is chosen knowing xtx_{t} but not ξt\xi_{t}. Prove that the minimal expected cost can be written Vh(x0)=hc+x02ΠhV_{h}\left(x_{0}\right)=h c+x_{0}^{2} \Pi_{h} and derive a recurrence for calculating Π1,,Πh\Pi_{1}, \ldots, \Pi_{h}.

How does your answer change if utu_{t} is constrained to lie in the set U(xt)={u:\mathcal{U}\left(x_{t}\right)=\{u: u+xt<xt}?\left.\left|u+x_{t}\right|<\left|x_{t}\right|\right\} ?

Consider a stopping problem for which there are two options in state xt,t0x_{t}, t \geqslant 0 :

(1) stop: paying a terminal cost 3xt23 x_{t}^{2}; no further costs are incurred;

(2) continue: choosing utU(xt)u_{t} \in \mathcal{U}\left(x_{t}\right), paying c+ut2+xt2c+u_{t}^{2}+x_{t}^{2}, and moving to state xt+1=(xt+ut)ξt.x_{t+1}=\left(x_{t}+u_{t}\right) \xi_{t} .

Consider the problem of minimizing total expected cost subject to the constraint that no more than hh continuation steps are allowed. Suppose a=1a=1. Show that an optimal policy stops if and only if either hh continuation steps have already been taken or x22c/3x^{2} \leqslant 2 c / 3.

[Hint: Use induction on hh to show that a one-step-look-ahead rule is optimal. You should not need to find the optimal utu_{t} for the continuation steps.]