Paper 3, Section II, K

Optimization and Control
Part II, 2015

A burglar having wealth xx may retire, or go burgling another night, in either of towns 1 or 2 . If he burgles in town ii then with probability pi=1qip_{i}=1-q_{i} he will, independently of previous nights, be caught, imprisoned and lose all his wealth. If he is not caught then his wealth increases by 0 or 2ai2 a_{i}, each with probability 1/21 / 2 and independently of what happens on other nights. Values of pip_{i} and aia_{i} are the same every night. He wishes to maximize his expected wealth at the point he retires, is imprisoned, or ss nights have elapsed.

Using the dynamic programming equation

Fs(x)=max{x,q1EFs1(x+R1),q2EFs1(x+R2)}F_{s}(x)=\max \left\{x, q_{1} E F_{s-1}\left(x+R_{1}\right), q_{2} E F_{s-1}\left(x+R_{2}\right)\right\}

with Rj,F0(x)R_{j}, F_{0}(x) appropriately defined, prove that there exists an optimal policy under which he burgles another night if and only if his wealth is less than x=maxi{aiqi/pi}x^{*}=\max _{i}\left\{a_{i} q_{i} / p_{i}\right\}.

Suppose q1>q2q_{1}>q_{2} and q1a1>q2a2q_{1} a_{1}>q_{2} a_{2}. Prove that he should never burgle in town 2 .

[Hint: Suppose x<xx<x^{*}, there are ss nights to go, and it has been shown that he ought not burgle in town 2 if less than ss nights remain. For the case a2>a1a_{2}>a_{1}, separately consider subcases x+2a2xx+2 a_{2} \geqslant x^{*} and x+2a2<xx+2 a_{2}<x^{*}. An interchange argument may help.]