Paper 2, Section II, 26 K26 \mathrm{~K}

Optimization and Control
Part II, 2015

As a function of policy π\pi and initial state xx, let

F(π,x)=Eπ[t=0βtr(xt,ut)x0=x]F(\pi, x)=E_{\pi}\left[\sum_{t=0}^{\infty} \beta^{t} r\left(x_{t}, u_{t}\right) \mid x_{0}=x\right]

where β1\beta \geqslant 1 and r(x,u)0r(x, u) \geqslant 0 for all x,ux, u. Suppose that for a specific policy π\pi, and all xx,

F(π,x)=supu{r(x,u)+βE[F(π,x1)x0=x,u0=u]}.F(\pi, x)=\sup _{u}\left\{r(x, u)+\beta E\left[F\left(\pi, x_{1}\right) \mid x_{0}=x, u_{0}=u\right]\right\} .

Prove that F(π,x)F(π,x)F(\pi, x) \geqslant F\left(\pi^{\prime}, x\right) for all π\pi^{\prime} and xx.

A gambler plays games in which he may bet 1 or 2 pounds, but no more than his present wealth. Suppose he has xtx_{t} pounds after tt games. If he bets ii pounds then xt+1=xt+ix_{t+1}=x_{t}+i, or xt+1=xtix_{t+1}=x_{t}-i, with probabilities pip_{i} and 1pi1-p_{i} respectively. Gambling terminates at the first τ\tau such that xτ=0x_{\tau}=0 or xτ=100x_{\tau}=100. His final reward is (9/8)τ/2xτ(9 / 8)^{\tau / 2} x_{\tau}. Let π\pi be the policy of always betting 1 pound. Given p1=1/3p_{1}=1 / 3, show that F(π,x)x2x/2F(\pi, x) \propto x 2^{x / 2}.

Is π\pi optimal when p2=1/4p_{2}=1 / 4 ?