# A DP Analysis of the Student Prince Pub Problem

So suppose that we have these famous three mug and further assume that the objective of the beer pouring operation is to minimize the total volume of beer poured. In other words, we wish to move 4 oz of beer from the large Red mug to the middle Blue mug using a pouring scheme that minimizes the amount of beer pored in the process. Recall that the initial state of the process is s=(0,0) and its final state is s=(0,4) where s=(g,b) with g representing the amount of beer in the Blue mug and b reprsents the amount of beer in the Gray mug.

Let f(g,b) the minimum amount of beer that must be poured so as to move from the initial state to the state s=(g,b). Then, our objective is to determine the value of f(0,4).

Now, clearly, to reach a feasible state s=(g,b) from the initial state s=(0,0) we must reach some feasible state s'=(g',b') and make some feasible decision a=(i,j) associated with s' such that s is the state resulting from applying the decision a=(i,j) at state s'. Consequently, the following relation must hold:

(1) . . . . . f(g,b) = q(i,j,g',b') + f(g',b')

where g(i,j,g',b') is the amount of beer poured if action a=(i,j) is applied at state s'=(g'b').

That's very fine, but ....., the trouble is that we do not know what specific state s' is the best in this framework. We thus have to select the best (smallest feasible) value on the right-hand side of (1). This produce the following typical DP functional equation:

(2) . . . . . f(g,b) = min {q(i,j,g',b') + f(g',b') : (g',b') in Pred(g,b)}

where Pred(s) denotes the set of all the immediate predecessors of state s, namely Pred(s) is the colleftion of all states s' such that s can be reached in one pouring from state s'.

Unfortunately, it is not very convenient to solve (2) for all the feasible because the right-hand side of (2) requires the values of f(g',b') for all the immediate predecessors of state s=(g,b). This means that we have to solve (2) in a particular order with regard to the pairs (g,b). It turns out that the order required by the Pred() is not very convenient. Rather, it is more convenient to use the immediate successors.

For this reason it is important to appreciate the following two simple facts of life: First, the obvious observation

(3) . . . . . f(0,0) = 0

and then the bureaucratic point that the functional equation can be rewritten with "successors" rather than "predecessors" in mind. That is,

(4) . . . . . f(g,b) = min {q(i,j,g',b') + f(g',b') : (g,b) in Suc(g',b')}

where Suc(s') denotes the set of all immediate successors of state s'.

We consider two successive approximation algorithms for this functional equation. They differ in the manner in which the feasible states are generated through the successive approximation procedure.

If you have not done any successive approximation before, here is a very brief outline of how it works.

• Initialisation:
You approximate the function you wish to construct by a suitable convenient scheme. So this is the initial approximation.

• Update:
You successively update the approximation by some scheme utilising the most recent approximation, typically utilising a basic property of the function you with to construct.

• termination:
You terminate the process either if the process converges, or it becomes apparent that the process will not converge or if you run out patient, whichever comes first.

The first procedure we shall examine is simple in that the order in which the states are generated is based on a width-first strategy. The second is a bit more sophisticated: the states are generated in a best first order.

We shall explain what these terms mean after we formulate the algorithms. For the time being let us just say that the width-first algorithm generates all the feasible states. The states are generated in a chronological order based on how far (transitions) they are from the initial state. The best-first algorithm is smarter: it excludes from the state generation process some of the non-optimal immediate successors. This trick capitalises on the fact that the values of q(i,j,g',b') are strictly positive, hence, f(s) > f(s') for any pair of states (s,s') such that s is an immediate successor of s' and (1) holds.

Here is the width-first algorithm.

### Algorithm A: Simplicity is Beautiful!

• Step 1: Initialisation
Set h(0,0) = 0 and h(s) = Infinity for all other states. Also, set S'={(0,0)}.

• Step 2: Updating
For each (g',b') in S' do:

h(g,b) = min { h(g,b) , q(i,j,g',b') + h(g',b')}

for all (g,b) in Suc(g',b').

Let S' be the set of all the (g,b) pairs for which there was decrease in the value of h(g,b) as a result of this update.

• Step 3: Stopping Rule
If S' is empty stop. Otherwise, go to Step 2.

It is not difficult to show that upon completion this algorithm yields values of h(s) such that h(s) = f(s) for all the feasible states and h(s) = Infinity for all the non-feasible states. Note that a priori we do not know what states are feasible and what states are not feasible. Thus, at the initialisation step we may consider all states s=(g,b) such that g is in {0,1,2,3} and b is in {0,1,2,3,4,5}.

We illustarte how the algorithm works by going through two iterations of the updating step.

Step 1:
Set h(0,0) = 0, h(s) = Infinity for all other states and S'= {(0,0)}.

Step 2:
The set S' consists of only one state, namely s=(0,0), so we only have to update the h(s) values associated with the elements of Suc(0,0) = {(3,0),(0,5)}:

h(3,0) = min { h(3,0) , q(Red,Gray,0,0) + h(0,0) } = min {infinity , 3 + 0} = 3
h(0,5) = min { h(0,5) , q(Red,Blue,0,0) + h(0,0) }= min {Infinity , 5 + 0} = 5

There was an improvement in both h(3,0) and h(0,5) hence we set S'={(3,0),(0,5)}.

Step 3:
Since S' is not empty we go to Step 2.

Step 2:
We have to generate the immediate successors of the elements of S'={(3,0),(0,5)} and update thier h(s) values. The successors set are as follows:

Suc(3,0) = {(0,3),(3,5)}
Suc(0,5) = {(3,2),(3,5)}

So we have to update the values of h(0,3), h(3,5) and h(3,2). Observe that the value of h(3,5) is updated twice and that there is no need to update the value of h(0,0) (why?).

h(0,3) = min { h(0,3) , q(Gray,Blue,3,0) + h(3,0) = min {Infinity , 3 + 3} = 6
h(3,5) = min { h(3,5) , q(Red,Blue,3,0) + h(3,0)} = min {Infinity , 5 + 3} = 8
h(3,5) = min { h(3,5) , q(Red,Gray,0,5) + h(0,5)} = min {8 , 3 + 5} = 8
h(3,2) = min { h(3,2) , q(Blue,Gray,0,5) + h(0,5) = min {Infinity , 3 + 5} = 8

Clearly, the values of h(s) have been improved for all the new states, thus we set S' = {(0,3),(3,5),(3,2)}.

Step 3:
Since S'is not empty we go to Step 2.

Step 2:
We have to generate the immediate successors of the elements of S'={(0,3),(3,5),(3,2)} and update thier h(s) values. The successors set are as follows:

Suc(0,3) = {(3,0),(3,3),(0,0),(0,5)}
Suc(3,5) = {(3,0),(0,5)}
Suc(3,2) = {(0,5),(0,2),(3,0),(3,5)}

Thus, we have to update the h(s) values for all s in {(3,0),(3,3),(0,5),(0,2),(3,5)}, observing that some values will be updated twice (eg. h(0,5) and h(3,0). Reminder: it is not necessary to update the value of h(0,0).

h(3,0) = min { h(3,0) , q(Blue,Gray,3,0) + h(0,3) = min {3 , 3 + 6} = 3
h(3,3) = min { h(3,3) , q(Red,Gray,0,3) + h(0,3)} = min {Infinity , 3 + 6} = 9
h(0,5) = min { h(0,5) , q(Red,Blue,0,3) + h(0,3)} = min {5 , 2 + 6} = 5
h(3,0) = min { h(3,0) , q(Gray,Blue,3,5) + h(3,5) = min {3 , 3 + 8} = 3
h(0,5) = min { h(0,5) , q(Gary,Red,3,5) + h(3,5)} = min {5 , 3 + 8} = 5
h(0,5) = min { h(0,5) , q(Gray,Blue,3,2) + h(3,2)} = min {5 , 3 + 8} = 5
h(0,2) = min { h(0,2) , q(Gray,Red,3,2) + h(3,2) = min {Infinity , 3 + 8} = 11
h(3,0) = min { h(3,0) , q(Blue,Red,3,2) + h(3,2)} = min {3 , 2 + 8} = 3
h(3,5) = min { h(3,5) , q(Red,Blue,3,2) + h(3,2)} = min {8 , 3 + 8} = 8

We observe that the value of h(s) has improved for the elements of S' = {(3,3),(0,2)}.

Step 3:
Since S' is not empty we go to Step 2.

We continue in this manner until S' is empty.

The following picture describes how the states were generated above. The blue line indicates the point where we stopped the process. It also indicates that had we decided to go though one more iteration (Step 2) two news states would have been generated: (1,5) and (2,0).

For your convenience there is an interactive facility to execute the algorithm. It represents the h(g,b) values as a matrix and a control button can be used to move to the next iteration. The current content of the set S' at each iteration is also indicated.

h(g,b)

b

g

 0 1 2 3 4 5 0 1 2 3
Instructions:
• "-" represents "Infinity".
• Click on the "Iterate" button to compute the new updated values of h(g,b)
• Click on the "Reset" button to start the algorithm from scratch
• The green cells represent the set S', namely the collection of states whose h(s) value has been improved during the last iteration. The h(s) values of the immediate successors of these states will be updated in the next iteration
• Click on the "Report" button to produce a report on the iterations.
• Click on the "Recover" button to generate the optimal sequence of pourings.

Well, this is all that is ready at present. We shall update this module soon.

 Contributed by
© The University of Melbourne 1994-2000.