B4.14
A discrete-time decision process is defined on a finite set of states as follows. Upon entry to state at time the decision-maker observes a variable . He then chooses the next state freely within , at a cost of . Here is a sequence of integer-valued, identically distributed random variables. Suppose there exist and such that for all
Let denote a policy. Show that
At the start of each month a boat manufacturer receives orders for 1, 2 or 3 boats. These numbers are equally likely and independent from month to month. He can produce boats in a month at a cost of units. All orders are filled at the end of the month in which they are ordered. It is possible to make extra boats, ending the month with a stock of unsold boats, but cannot be more than 2 , and a holding cost of is incurred during any month that starts with unsold boats in stock. Write down an optimality equation that can be used to find the long-run expected average-cost.
Let be the policy of only ever producing sufficient boats to fill the present month's orders. Show that it is optimal if and only if .
Suppose . Starting from , what policy is obtained after applying one step of the policy-improvement algorithm?