Paper 3, Section II, K
A burglar having wealth may retire, or go burgling another night, in either of towns 1 or 2 . If he burgles in town then with probability he will, independently of previous nights, be caught, imprisoned and lose all his wealth. If he is not caught then his wealth increases by 0 or , each with probability and independently of what happens on other nights. Values of and are the same every night. He wishes to maximize his expected wealth at the point he retires, is imprisoned, or nights have elapsed.
Using the dynamic programming equation
with appropriately defined, prove that there exists an optimal policy under which he burgles another night if and only if his wealth is less than .
Suppose and . Prove that he should never burgle in town 2 .
[Hint: Suppose , there are nights to go, and it has been shown that he ought not burgle in town 2 if less than nights remain. For the case , separately consider subcases and . An interchange argument may help.]