B4.14

A discrete-time decision process is defined on a finite set of states $I$ as follows. Upon entry to state $i_{t}$ at time $t$ the decision-maker observes a variable $\xi_{t}$ . He then chooses the next state freely within $I$ , at a cost of $c\left(i_{t}, \xi_{t}, i_{t+1}\right)$ . Here $\left\{\xi_{0}, \xi_{1}, \ldots\right\}$ is a sequence of integer-valued, identically distributed random variables. Suppose there exist $\left\{\phi_{i}: i \in I\right\}$ and $\lambda$ such that for all $i \in I$

\phi_{i}+\lambda=\sum_{k \in \mathbb{Z}} P\left(\xi_{t}=k\right) \min _{i^{\prime} \in I}\left[c\left(i, k, i^{\prime}\right)+\phi_{i^{\prime}}\right] .

Let $\pi$ denote a policy. Show that

\lambda=\inf _{\pi} \limsup _{t \rightarrow \infty} E_{\pi}\left[\frac{1}{t} \sum_{s=0}^{t-1} c\left(i_{s}, \xi_{s}, i_{s+1}\right)\right]

At the start of each month a boat manufacturer receives orders for 1, 2 or 3 boats. These numbers are equally likely and independent from month to month. He can produce $j$ boats in a month at a cost of $6+3 j$ units. All orders are filled at the end of the month in which they are ordered. It is possible to make extra boats, ending the month with a stock of $i$ unsold boats, but $i$ cannot be more than 2 , and a holding cost of $c i$ is incurred during any month that starts with $i$ unsold boats in stock. Write down an optimality equation that can be used to find the long-run expected average-cost.

Let $\pi$ be the policy of only ever producing sufficient boats to fill the present month's orders. Show that it is optimal if and only if $c \geqslant 2$ .

Suppose $c<2$ . Starting from $\pi$ , what policy is obtained after applying one step of the policy-improvement algorithm?