B4.14

Optimization and Control
Part II, 2002

A discrete-time decision process is defined on a finite set of states II as follows. Upon entry to state iti_{t} at time tt the decision-maker observes a variable ξt\xi_{t}. He then chooses the next state freely within II, at a cost of c(it,ξt,it+1)c\left(i_{t}, \xi_{t}, i_{t+1}\right). Here {ξ0,ξ1,}\left\{\xi_{0}, \xi_{1}, \ldots\right\} is a sequence of integer-valued, identically distributed random variables. Suppose there exist {ϕi:iI}\left\{\phi_{i}: i \in I\right\} and λ\lambda such that for all iIi \in I

ϕi+λ=kZP(ξt=k)miniI[c(i,k,i)+ϕi].\phi_{i}+\lambda=\sum_{k \in \mathbb{Z}} P\left(\xi_{t}=k\right) \min _{i^{\prime} \in I}\left[c\left(i, k, i^{\prime}\right)+\phi_{i^{\prime}}\right] .

Let π\pi denote a policy. Show that

λ=infπlim suptEπ[1ts=0t1c(is,ξs,is+1)]\lambda=\inf _{\pi} \limsup _{t \rightarrow \infty} E_{\pi}\left[\frac{1}{t} \sum_{s=0}^{t-1} c\left(i_{s}, \xi_{s}, i_{s+1}\right)\right]

At the start of each month a boat manufacturer receives orders for 1, 2 or 3 boats. These numbers are equally likely and independent from month to month. He can produce jj boats in a month at a cost of 6+3j6+3 j units. All orders are filled at the end of the month in which they are ordered. It is possible to make extra boats, ending the month with a stock of ii unsold boats, but ii cannot be more than 2 , and a holding cost of cic i is incurred during any month that starts with ii unsold boats in stock. Write down an optimality equation that can be used to find the long-run expected average-cost.

Let π\pi be the policy of only ever producing sufficient boats to fill the present month's orders. Show that it is optimal if and only if c2c \geqslant 2.

Suppose c<2c<2. Starting from π\pi, what policy is obtained after applying one step of the policy-improvement algorithm?