A policy π is to be chosen to maximize
F(π,x)=Eπ[t⩾0∑βtr(xt,ut)∣x0=x]
where 0<β⩽1. Assuming that r⩾0, prove that π is optimal if F(π,x) satisfies the optimality equation.
An investor receives at time t an income of xt of which he spends ut, subject to 0⩽ut⩽xt. The reward is r(xt,ut)=ut, and his income evolves as
xt+1=xt+(xt−ut)εt,
where (εt)t⩾0 is a sequence of independent random variables with common mean θ>0. If 0<β⩽1/(1+θ), show that the optimal policy is to take ut=xt for all t.
What can you say about the problem if β>1/(1+θ)?