Paper 3, Section II, K

A burglar having wealth $x$ may retire, or go burgling another night, in either of towns 1 or 2 . If he burgles in town $i$ then with probability $p_{i}=1-q_{i}$ he will, independently of previous nights, be caught, imprisoned and lose all his wealth. If he is not caught then his wealth increases by 0 or $2 a_{i}$ , each with probability $1 / 2$ and independently of what happens on other nights. Values of $p_{i}$ and $a_{i}$ are the same every night. He wishes to maximize his expected wealth at the point he retires, is imprisoned, or $s$ nights have elapsed.

Using the dynamic programming equation

F_{s}(x)=\max \left\{x, q_{1} E F_{s-1}\left(x+R_{1}\right), q_{2} E F_{s-1}\left(x+R_{2}\right)\right\}

with $R_{j}, F_{0}(x)$ appropriately defined, prove that there exists an optimal policy under which he burgles another night if and only if his wealth is less than $x^{*}=\max _{i}\left\{a_{i} q_{i} / p_{i}\right\}$ .

Suppose $q_{1}>q_{2}$ and $q_{1} a_{1}>q_{2} a_{2}$ . Prove that he should never burgle in town 2 .

[Hint: Suppose $x<x^{*}$ , there are $s$ nights to go, and it has been shown that he ought not burgle in town 2 if less than $s$ nights remain. For the case $a_{2}>a_{1}$ , separately consider subcases $x+2 a_{2} \geqslant x^{*}$ and $x+2 a_{2}<x^{*}$ . An interchange argument may help.]