Markovian decision process and dynamic programming solution

dynamic programmingmarkov-processprobabilityprobability theorystochastic-processes

A home supply store can place orders for fridges at the start of each month for immediate delivery. A cost of $\$ 100$ is incurred each time an order is placed. The cost of storage per fridge is $\$ 5.$ The penalty for running out of stock is estimated at $\$ 150$ per fridge per month. The monthly demand is given by:

demand $x:\;\qquad 0\;\qquad 1\;\qquad 2$

pdf $p(x):\qquad .2 \qquad.5\qquad .3$

respectively.

The policy of the store is that the maximum level of stock should not exceed $2$ fridges in any month.

a) Determine the transition probabilities of the different decision options of the problem.

b) Find the expected cost of inventory, per month, in function of the state of the system and the alternative of the decision.

c) Find the optimal policy of order in place within the next $3$ months.

Attempt.

  • a) I tried to follow Henry's comment:

Take for example having 1 in stock at the beginning of the month and ordering 0, which would be represented by a single entry in your array. But what you need to describe for this single case is (a) demand of 0 with probability of 0.2 so so transitioning to 1 at a cost of $\$5$; (b) demand of 1 with probability of 0.5 so transitioning to 0 at a cost perhaps of $\$0$ or of $\$5$ (less profit); (c) demand of 2 with probability of 0.3 so transitioning to 0 at a cost perhaps of $\$150$ or of $155 (less profit).

Consider the states $0,1,2$ to be the available (columns) and $0,1,2$ the possible fridges to order at the first of the month (row), and having $1$ in stock and ordering $0$ then the matrix is

$$P=\begin{pmatrix}
&&\\
.5\ or\ .3&.2&0\\
&&
\end{pmatrix}$$

And the matrix of costs

$$C=\begin{pmatrix}
&&&\\
\$0\ or\ \$5\ OR\ \$150\ or\ \$155&\$5&\$0\\
&&&
\end{pmatrix}$$

As you can notice on the first (and second) matrix I don't know what to place in the entry $(1,0)$ since both probabilities (costs) end up in the $0$ state.

How can I solve this problem?

Notice also that I need it in matrix form to solve b)

  • b) To find the expected cost of inventory, per month, in function of the state of the system and the alternative of the decision I think I should use $$c_{ij}=\displaystyle\sum_{j=0}^Mr_{ij}(k)p_{ij}(k)$$ where $k$ is the considered decision, $r_{ij}=$expected cost by choosing $k$ decision in the state $i$ and to transition to state $j$ and and $M$ is the total of states.

  • c)I think I can find the solution using Dynamic programming with $N=3$ on the formula $$f_n(i)=\max_k\{\sum_{j=1}^{m}p_{ij}^k[r_{ij}^k+f_{n+1}(j)]\},n=1,2,…,N$$ where $f_{N+1}(j)=0$ for all $j$

Am I correct so far?

Help me please

Best Answer

  1. If I understand the problem correctly, there is a small penalty for having too many fridges (\$5/fridge) and a large penalty for having too few fridges (\$150/fridge). Therefore, sensibly, the right strategy should be to buy enough fridges so that you have the maximum number of fridges (2) each month.

  2. To compute the expected upkeep costs, first make a cost matrix. Forget about the buying process (the decision you make to buy fridges and how much you pay for each decision). Just consider how much you pay for upkeep as a function of $i$ (the number of fridges you have) and $j$ (the number of fridges your customers want) :

    $$C = \quad\begin{array}{rc}&\text{they want }\rightarrow\\\text{you have }\downarrow&\begin{bmatrix}\$0 & -\$150\times 1 & -\$150\times 2 \\ -\$5\times1 &\$0&-\$150\times1\\-\$5\times2 & -\$5\times 1 & \$0\\\end{bmatrix}\end{array}$$

    You determine how much supply you have (the row of the cost matrix). The customer demand is chosen at random. To compute the expected value of each supply, you can multiply this cost matrix by the probability of each demand amount:

    $$E =\begin{array}{cl}\begin{bmatrix}\$0 & -\$150\times 1 & -\$150\times 2 \\ -\$5\times1 &\$0&-\$150\times1\\-\$5\times2 & -\$5\times 1 & \$0\\\end{bmatrix}\begin{bmatrix}0.2\\0.5\\0.3\end{bmatrix}&\end{array} = \begin{bmatrix}\$165.00\\\$46.00\\\$4.50\end{bmatrix}$$

    That is, if you have 0 fridges, your expected cost is \$165/mo. If you have 1 fridge, your expected cost is \$46/mo. If you have 2 fridges, your expected cost is \$4.50/mo. You should therefore always want two fridges in stock.

  3. You get to choose how much stock to buy at the start of each month. Therefore you can always guarantee that you have two fridges in stock. It costs \$100 to place an order for any number of fridges.

    If we assume that you will always want to have two fridges in stock, we can include ordering costs in the cost matrix. The number of fridges you have at the end of the month is:

    $$\begin{array}{rc}&\text{they want }\rightarrow\\\text{you have }\downarrow&\begin{bmatrix}0 & 1 & 2 \\ 0& 0 & 1 \\ 0 & 0 & 0\\\end{bmatrix}\end{array}$$

    If you place a \$100 order for new fridges whenever you have 0 or 1 fridges remaining, your reordering costs are:

    $$C_2 = \begin{array}{rc}&\text{they want }\rightarrow\\\text{you have }\downarrow&\begin{bmatrix}-\$100 & -\$100 & \$0 \\ -\$100& -\$100 & -\$100 \\ -\$100 & -\$100 & -\$100\\\end{bmatrix}\end{array}$$


enter image description here


  • In a particular month, you may have 0, 1, or 2 refrigerators in stock. This is your supply, and you get to control it by deciding how many refrigerators to buy.

  • In a particular month, your customers may want 0, 1, or 2 refrigerators. This is the demand, and it is considered to be a probabilistic outcome. According to your table, there is a 20% chance that the demand will be 0, a 50% chance the demand will be 1, and a 30% chance that it will be 2.

  • Let's consider what happens in a given month. You have a certain number of fridges in stock, between 0 and 2. At the start of the month, you can buy any number of fridges (as long as the total is less than two); placing such an order costs some amount of money.

    Probabilistically, there will be demand for some fridges. If there is more demand than supply, you lose a certain penalty amount of money. If there is more supply than demand, you have to pay a small amount of money for the stock you keep around.

  • A Markov decision process consists of states with arrows between them. Each arrow is labeled with an action $a$ and a probability $p$. The labels mean "If I am in state X, and I take action $a$, then there is a $p$ probability chance that I end up in state Y".

  • Here, the actions correspond to supplies you can order—you can order 0, 1, or 2 new refrigerators.

  • The probabilities correspond to demand for refrigerators. The demand is chosen at random according to the probabilities in your table.

  • To take a simple example, suppose you have 1 refrigerator in stock right now. This is your current state. At the start of the month, your allowed actions are to buy one more refrigerator $A_+$, or to buy no more refrigerators $A_0$. There are three possible outcomes: people buy zero, one, or two refrigerators. The probabilities are determined by your table.

    Consider the six possible outcomes. First, if you buy a new fridge, you'll pay an ordering fee and have two fridges total. People may want 0, 1, or 2 fridges. 20% of the time, they won't want any fridges, so you'll have to pay to keep two fridges around. 50% of the time, they'll want one fridge so you'll pay to keep the other around. 30% of the time, they'll want both fridges, which is ideal. Second, if you buy no new fridges, you'll still have one fridge and won't have to pay the ordering fee. 20% of the time, they won't want any fridges, so you'll have to pay to keep the one fridge around. 50% of the time, they'll want your one fridge, which is ideal. 30% of the time, they'll want two fridges, but since you only have one, you'll lose money.

    The arrows point from the state you start in at the beginning of the month to the state you end up in at the end of the month. For example, if you start with one fridge and buy one more, and no one buys any refrigerators, the arrow will point from the state with 1 fridge (start of the month) to the state with 2 fridges (end of the month).

  • Following this same reasoning, consider what you can do if you instead start with zero fridges, or start with two fridges. You can order a certain number of new fridges, and people will demand between 0 and 2 of them.

  • A matrix is a good way to record probabilities. For each deterministic decision $k$ you make (to buy 0, 1, or 2 fridges), you can make a matrix recording the different outcomes of your decision. The rows of the matrix can denote your starting state (# fridges at start of month = 0,1,2), and the columns can denote your ending state (# fridges at end of month = 0, 1, 2),. The probabilities come from your table.

Related Question