2.II.29I

Optimization and Control
Part II, 2006

A policy π\pi is to be chosen to maximize

F(π,x)=Eπ[t0βtr(xt,ut)x0=x]F(\pi, x)=\mathbb{E}_{\pi}\left[\sum_{t \geqslant 0} \beta^{t} r\left(x_{t}, u_{t}\right) \mid x_{0}=x\right]

where 0<β10<\beta \leqslant 1. Assuming that r0r \geqslant 0, prove that π\pi is optimal if F(π,x)F(\pi, x) satisfies the optimality equation.

An investor receives at time tt an income of xtx_{t} of which he spends utu_{t}, subject to 0utxt0 \leqslant u_{t} \leqslant x_{t}. The reward is r(xt,ut)=utr\left(x_{t}, u_{t}\right)=u_{t}, and his income evolves as

xt+1=xt+(xtut)εt,x_{t+1}=x_{t}+\left(x_{t}-u_{t}\right) \varepsilon_{t},

where (εt)t0\left(\varepsilon_{t}\right)_{t \geqslant 0} is a sequence of independent random variables with common mean θ>0\theta>0. If 0<β1/(1+θ)0<\beta \leqslant 1 /(1+\theta), show that the optimal policy is to take ut=xtu_{t}=x_{t} for all tt.

What can you say about the problem if β>1/(1+θ)?\beta>1 /(1+\theta) ?