Paper 2, Section II, $26 \mathrm{~K}$

As a function of policy $\pi$ and initial state $x$ , let

F(\pi, x)=E_{\pi}\left[\sum_{t=0}^{\infty} \beta^{t} r\left(x_{t}, u_{t}\right) \mid x_{0}=x\right]

where $\beta \geqslant 1$ and $r(x, u) \geqslant 0$ for all $x, u$ . Suppose that for a specific policy $\pi$ , and all $x$ ,

F(\pi, x)=\sup _{u}\left\{r(x, u)+\beta E\left[F\left(\pi, x_{1}\right) \mid x_{0}=x, u_{0}=u\right]\right\} .

Prove that $F(\pi, x) \geqslant F\left(\pi^{\prime}, x\right)$ for all $\pi^{\prime}$ and $x$ .

A gambler plays games in which he may bet 1 or 2 pounds, but no more than his present wealth. Suppose he has $x_{t}$ pounds after $t$ games. If he bets $i$ pounds then $x_{t+1}=x_{t}+i$ , or $x_{t+1}=x_{t}-i$ , with probabilities $p_{i}$ and $1-p_{i}$ respectively. Gambling terminates at the first $\tau$ such that $x_{\tau}=0$ or $x_{\tau}=100$ . His final reward is $(9 / 8)^{\tau / 2} x_{\tau}$ . Let $\pi$ be the policy of always betting 1 pound. Given $p_{1}=1 / 3$ , show that $F(\pi, x) \propto x 2^{x / 2}$ .

Is $\pi$ optimal when $p_{2}=1 / 4$ ?