Paper 2, Section II, 30K30 K

Optimisation and Control
Part II, 2018

(a) A ball may be in one of nn boxes. A search of the ith i^{\text {th }}box costs ci>0c_{i}>0 and finds the ball with probability αi>0\alpha_{i}>0 if the ball is in that box. We are given initial probabilities (Pi1,i=1,2,,n)\left(P_{i}^{1}, i=1,2, \ldots, n\right) that the ball is in the ith i^{\text {th }}box.

Show that the policy which at time t=1,2,t=1,2, \ldots searches the box with the maximal value of αiPit/ci\alpha_{i} P_{i}^{t} / c_{i} minimises the expected searching cost until the ball is found, where PitP_{i}^{t} is the probability (given everything that has occurred up to time tt ) that the ball is in box ii.

(b) Next suppose that a reward Ri>0R_{i}>0 is earned if the ball is found in the ith i^{\text {th }}box. Suppose also that we may decide to stop at any time. Develop the dynamic programming equation for the value function starting from the probability distribution (Pit,i=1,2,,n)\left(P_{i}^{t}, i=1,2, \ldots, n\right).

Show that if ici/(αiRi)<1\sum_{i} c_{i} /\left(\alpha_{i} R_{i}\right)<1 then it is never optimal to stop searching until the ball is found. In this case, is the policy defined in part (a) optimal?