OR-Notes

J E Beasley

OR-Notes are a series of introductory notes on topics that fall under the broad heading of the field of operations research (OR). They were originally used by me in an introductory OR course I give at Imperial College. They are now available for use by any students and teachers interested in OR subject to the following conditions.

A full list of the topics available in OR-Notes can be found here.

Graph theory

Introduction

Graph theory deals with problems that have a graph (or network) structure. In this context a graph (or network as many people use the terms interchangeable) consists of:

vertices/nodes - which are a collection of points; and
arcs - which are lines running between the nodes. Such arcs may be directed or undirected and undirected arcs are often called links or edges.

An example graph is shown below.

Graph theory is used in dealing with problems which have a fairly natural graph/network structure, for example:

road networks - nodes = towns/road junctions, arcs = roads
communication networks - telephone systems
computer systems
foreign exchange/multinational tax planning (network of fiscal flows)

Note here that the minimum cost network flow problem (also dealt with in this course) is an example of a problem with a graph/network structure.

Some of you may be reading this document via the Web. That is also a graph, with each document (file) being a node and each hypertext link (the thing you click on to go elsewhere) an arc.

Graph theory has a relatively long history in classical mathematics. In 1736 Euler solved the problem of whether, given the map below of the city of Konigsberg in Germany, someone could make a complete tour, crossing over all 7 bridges over the river Pregel, and return to their starting point without crossing any bridge more than once.

(map courtesy of The MacTutor History of Mathematics Archive).

See if you can reach the same conclusion as Euler did. The picture below shows the city, but simplified so that just the river and bridges are shown - do you think that someone could make a complete tour, crossing over all 7 bridges over the river, and return to their starting point without crossing any bridge more than once.

The area of algorithmic graph theory (algorithms to solve graph theory problems) came to prominence in the mid 1970's with the work of Nicos Christofides (Imperial College Management School) and has become more fashionable/well-known since then.

In this note we consider two graph theory problems relating to spanning trees and shortest paths in detail and outline the multinational tax planning problem. More about graph theory can be found here.

Spanning trees

Consider the following problem. In the diagram shown below we have four wells in an offshore oilfield (nodes 1 to 4 below) and an on-shore terminal (node 5 below). The four wells in this field must be connected together via a pipeline network to the on-shore terminal.

The various pipelines that can be constructed are shown as links in the diagram below and the cost of each pipeline is given next to each link. What pipelines would you recommend be built?

It is clear that this particular problem is a specific example of a more general problem - namely given a graph (such as that shown above) which links would we use so that:

the total cost of the links used is a minimum; and
all the points are connected together.

This problem is called the shortest spanning tree (SST) problem. Note here that the phrase "minimal spanning tree" (MST) is also often used instead of the phrase "shortest spanning tree" (SST).

For example, in the diagram above, one possible structure connecting all the points together would consist of the links 1-2, 2-3, 3-4, 4-5 and 5-1, but there are other structures where the total cost of the links used is smaller than in this structure (e.g. drop any link in the circle (cycle) 1-2-3-4-5-1 formed above).

Points to note :

for an n-node graph the solution involves (n-1) links
the solution can be easily arrived at by using an algorithm due to Kruskal (developed in the mid 1950's)
the final structure is called the shortest spanning tree (SST) of the graph.

We consider below the Kruskal algorithm for finding the shortest spanning tree of a graph.

Kruskal algorithm

The algorithm is:

"For a graph with n nodes keep adding the shortest (least cost) link - avoiding the creation of circuits - until (n-1) links have been added"

Note here that the Kruskal algorithm only applies to graphs in which all the links are undirected.

For the graph shown above, applying the Kruskal algorithm and starting with the shortest (least cost) link, we have:

Link   Cost   Decision
1-5    2      add to tree
1-2    3      add to tree
1-4    3      add to tree
5-4    4      reject as forms circuit 1-5-4-1
5-2    5      reject as forms circuit 1-5-2-1
2-3    6      add to tree 
Stop as 4 links have been added and these are all 
we need for the SST of a graph with 5 nodes.

Hence the SST of the graph shown above consists of the links 1-5, 1-2, 1-4 and 2-3 (total cost 14) and is as shown below.

Note here that in applying the Kruskal algorithm we often encounter the situation in which two (or more) links have the same cost (e.g. consider links 1-2 and 1-4 in the above example). In such a case the order in which these (equal cost) links are considered does not matter [different SST's may result depending upon the order chosen but they will all have the same total cost and this cost will be the minimum that can be achieved].

The problem shown above was also solved using the package, the input being shown below.

The output is shown below.

The principal limitation of the pipeline network shown is (in practical terms) that we have ignored the fact that some pipelines carry more oil than others and this means that they must therefore be of larger diameter and hence more expensive (per unit length).

We therefore have the situation that the cost of a pipeline depends upon the amount of oil it must carry - but we do not know how much oil any particular pipeline is carrying until we have decided the entire pipeline network.

Iterative application of SST calculations provides one way of solving such practical pipeline design problems. Typically we first set a cost for each pipeline, then

solve the SST problem
calculate how much oil is carried in each pipeline
adjust pipeline costs accordingly and reloop

Other difficulties include:

mixing oil of different grades
gases given off
use of compressors
geographic/political problems in getting the pipeline to go where we want
network very vulnerable to a failure in a link or at a node
time periods, the flow from each well will vary over the years.

A pioneering application of SST models to pipeline network design (due to Frank and Frisch) occurred in the early 1960's in the design of a gas pipeline network in the Gulf of Mexico. Savings resulted of the order of tens of millions of dollars. This illustrates how, in OR, nice neat theoretical problems (like the SST) can be used as building blocks to solve complex practical problems.

Network design

The above problem can be considered as a network design problem. Such problems typically involve a number of distinct points (nodes) and connections between pairs of nodes (links) and the problem is to design a suitable network to connect together a number of the nodes at minimum cost and which satisfies a number of criteria. For example we may want a minimum cost network which still functions after a failure in any one of the links.

Such problems are increasingly important nowadays, in particular in the telecommunications industry. In the simplest case the nodes are given, each possible link has a cost associated with its use and the problem is which links should be used so as to minimise total cost whilst ensuring that the network is connected. This problem is precisely the SST problem dealt with above.

Other factors that can occur here are:

there are specified flows between all pairs of nodes and capacities on the links
we may require the network to satisfy certain connectivity requirements (e.g. the network remains connected/viable if any link fails).

More generally we can distinguish between vertex connectivity and edge connectivity:

A graph is said to have a vertex connectivity of K if the minimum (smallest) number of vertices we need to remove to disconnect the graph is K
A graph is said to have an edge connectivity of K if the minimum (smallest) number of edges we need to remove to disconnect the graph is K

where by disconnect we mean that the graph remaining after removal is separated into two (or more) pieces.

For example the SST derived above has vertex connectivity 1 and edge connectivity 1.

For the pipeline problem considered above (shown again below):

what is a network that has edge connectivity 2?

one solution here is that any circuit (complete tour of the nodes where no edge is used more than once) has edge connectivity 2.

what is a network that has edge connectivity 3 ?

one solution here is to take the original network, but with the arc from 1 to 2 dropped, giving a network with edge connectivity 3

what is the minimum cost network that has edge connectivity 2?

the simple answer is that I do not know - but it is a valid question - and maybe one we could answer if we delved deeper into the subject of network design

what is the network that is least vunerable to link (edge) failure, i.e. the network that maximises edge connectivity?

The answer to the last question of what is the network that is least vunerable to link (edge) failure, i.e. the network that maximises K edge-connectivity is the picture shown above, i.e. the best network (least vunerable to link failure) is to build all of the pipelines shown. This will be the most expensive network we could have (costing 48). Hence we have an obvious tradeoff between how vunerable a network is to failure and its cost, from the SST (of cost 14) which is vunerable to a single link failure to the complete network which costs 48.

One example of a network design problem in the US military context was rewiring of a battleship. Given certain centers (nodes) within a battleship what should be the communication (wire) network between these centers so that communication is still maintained under certain damage scenarios.

Note here that the converse of this problem, given a network where should we damage it so as to meet certain criteria, also occurs. For example, for the network below, where the numbers by each arc represent the cost of removing the arc, which arcs would you take out to disconnect the source node 1 from the sink node 8 at minimum cost?

The solution is shown below, it can be found by solving the above problem as a minimum flow problem with the costs of the above arcs acting as capacities. See the network flow notes for more about this maximum flow problem.

Euclidean problems

Note here that the Kruskal algorithm not only applies to problems in which the links (connections) that can be made between vertices are explicitly given (as in the example considered above) but also to Euclidean problems of the type shown below (where any pair of points can be connected together).

We give below the application of the Kruskal algorithm to the graph shown above.

The x and y coordinates of the points shown in the graph above are:

Point (x,y)   Point (x,y)   Point (x,y)
1     (0,3)   5     (8,6)   9     (8,12)
2     (0,1)   6     (9,4)   10    (10,12)
3     (4,4)   7     (10,4)  11    (7,15)
4     (4,0)   8     (12,2)

So, applying the Kruskal algorithm and starting with the shortest link, we have:

Link   Length    Decision
6-7    1         add to tree
1-2    2         add to tree
9-10   2         add to tree
5-6    (5)^0.5     add to tree
7-8    (8)^0.5     add to tree
5-7    (8)^0.5     reject as forms circuit 5-6-7-5
9-11   (10)^0.5    add to tree
6-8    (13)^0.5    reject as forms circuit 6-7-8-6
3-4    4         add to tree
1-3    (17)^0.5    add to tree
2-4    (17)^0.5    reject as forms circuit 2-1-3-4-2
etc until we end up with the SST shown below

This SST is of total length

1+2+2+(5)^0.5+(8)^0.5+(10)^0.5+4+(17)^0.5+(20)^0.5+6 = 31.82

with the SST composed of the links 6-7, 1-2, 9-10, 5-6, 7-8, 9-11, 3-4 ,1-3, 3-5 and 9-5.

Note here that for Euclidean SST problems we have the (implicit) assumption that we can only connect vertices via existing vertices. If we are allowed to add new vertices to the graph then we are looking for the shortest Steiner tree and these new vertices are called Steiner vertices.

Note here that whilst you may consider that adding new vertices is not worthwhile the example below illustrates that by so doing we may end up with a solution costing less than the SST.

Whilst finding the appropriate number of new vertices to add (and their positions) is a difficult task it is known that (for Euclidean problems) each Steiner vertex will have three links associated with it with 120 degrees between these three links.

To illustrate this we show below how the addition of the two vertices to the SST shown above lead to the overall length of the tree being reduced from 31.82 to 30.89 (note that there may well be other vertices which we could add to further reduce the length of the tree).

Shortest path

Consider the diagram below where we have a graph with numbers associated with each arc representing the length of the arc. Of the possible paths from vertex 1 to vertex 5 (e.g. 1-2-4-5, 1-3-4-5, etc) how can we find the shortest path from vertex 1 to vertex 5?

Clearly with such a small example we could find the shortest path from vertex 1 to vertex 5 simply by inspection, or by enumerating all possible paths between vertex 1 and vertex 5. However if the graph was much larger then these approaches would not be feasible.

There exists an algorithm (developed by Dijsktra in the early 1960's) for finding the shortest path in a graph with non-negative edge/arc costs (without explicitly enumerating all possible paths). This algorithm is based upon a technique known as dynamic programming. We shall not consider this algorithm here but instead use the package to find the shortest path from vertex 1 to vertex 5 in the graph shown above.

The input and output for the package (for the example shown above) is shown below.

An example of the usefulness of the shortest path algorithm is the software package Autoroute Express which contains a graphical representation of the UK road network. Applying the shortest path algorithm to such a graph enables us to plan routes for delivery vehicles on the basis of the shortest distance route between stops, or the shortest time route between stops (essentially divide a road into segments, assign a category to each road segment and a vehicle speed for each category).

To give you an idea of the computer time required, the shortest path between any two points on the graph representing major UK roads could be calculated in a relatively short time (order of seconds) on a pc.

Multinational tax planning

To illustrate the multinational tax planning problem we shall use the example multinational company structure shown below where we have a Dutch holding company with three subsidiaries based in Japan, Hong Kong and Singapore.

The financial flows between these companies with which we are concerned can (broadly) be classified into three categories for tax purposes:

dividends (paid to shareholders);
royalties (e.g. paid on a license for a production process); and
interest (on loans).

When a financial flow falling into one of these categories occurs between two companies in different countries the treatment of that flow for tax purposes (i.e. the amount taken in tax in each country) is governed by a tax treaty between the two countries. Different pairs of countries have different tax treaties which immediately raises the possibility of paying a different amount of tax depending upon the route the financial flow follows.

For example an interest payment from the Japanese subsidiary direct to the Dutch holding company may be worse for tax purposes than if that interest payment is made to the Hong Kong subsidiary and then the Hong Kong subsidiary makes an interest payment to the Dutch holding company.

Just as we saw above for the SST, where adding extra nodes to a problem led to a decrease in cost, it also may be (and often is) advantageous to add extra nodes to this problem. For example to route a financial flow through a number of well-known "tax-havens" such as Netherlands Antilles and Bermuda, e.g. we could route our interest payment from Japan to the Dutch holding company via a Netherlands Antilles based company. Obvious extensions to this basic idea include:

routing financial flows through any number of countries to get the most advantageous tax treatment
"changing" the type of a particular financial flow (e.g. an interest flow to one country passed on as a royalty flow to another) to get the most advantageous tax treatment
choosing a "permanent" company structure (cross shareholdings, loan arrangements, etc) to get the most advantageous tax treatment.

Whilst tax treaties are very complex and not capable of being completely reduced to a set of formulae which can be used in an OR model some progress has been made using dynamic programming (work in this area started in the late 1970's). Hence it is now (essentially) possible to automatically generate the best route for a particular financial flow to follow in going between two specified countries. As a result there are computer packages around (mostly in banks) which can be of great assistance in multinational tax planning.

Note here that a more emotive phrase for the multinational tax planning problem is "tax avoidance".

Graph theory example 1995 UG exam

A company is considering building a gas pipeline to connect 7 wells (A,B,...,G) to a process plant H. The possible pipelines that they can construct and their costs are listed below.

Pipeline   Cost (£'0000)    Pipeline   Cost (£'0000) 
AB         23               CG         10
AE         17               DE         14
AD         19               DF         20
BC         15               EH         28
BE         30               FG         11
BF         27               FH         35

What pipelines do you suggest be built and what is the total cost of your suggested pipeline network.

Solution

The graph is shown below.

Applying the Kruskal algorithm and starting with the shortest (least cost) link, we have:

Link   Cost   Decision
CG     10     add to tree
FG     11     add to tree
DE     14     add to tree
BC     15     add to tree
AE     17     add to tree
AD     19     reject as forms circuit ADE
DF     20     add to tree
AB     23     reject as forms circuit ABCGFDE
BF     27     reject as forms circuit BFGC
EH     28     add to tree

Stop as 7 links have been added and these are all we need for the SST of a graph with 8 nodes.

Hence the SST has total cost 115 (£'000) and is as shown below.

Graph theory example 1991 UG exam

For the graph shown below calculate, showing all steps in the algorithm used, the shortest spanning tree. The numbers on the edges designate the distance between the corresponding pairs of nodes.

Solution

We have to find the shortest spanning tree (SST) of the graph so we use the Kruskal algorithm. Starting with the shortest (least cost) link we have:

Link  Length  Decision
AC    1       add to tree
CD    2       add to tree
CE    3       add to tree
AD    3       reject as forms circuit A-C-D-A
DB    4       add to tree
EG    4       add to tree
AB    5       reject as forms circuit A-D-B-A
GF    5       add to tree

Stop as 6 links have been added and these are all we need for the SST of a graph with 7 nodes.

The SST is shown below and has total length 1+2+3+4+4+5 = 19

Graph theory example 1985 UG exam

A company has eight plants at locations A, B, C, D, E, F, G and H whose Euclidean coordinates (in km.) are:

A: (50, 40)   E: (90, 50)
B: (90, 10)   F: (30, 80)
C: (50, 70)   G: (50, 20)
D: (50, 80)   H: (10, 60)

These plants are to be connected together by means of a telephone network using leased British Telecom lines. The cost of leasing a single line between two points is £50 per year per km. of line. The length of a line between two points is assumed to be the Euclidean (straight-line) distance. A message from one plant to another will travel only along leased lines, and will be switched from one line to another (if necessary) by switching centres located at the nodes where lines meet.

assuming that switching centres can only be located at the plants themselves, what is the least-cost telephone network that the company should lease and how much would it cost?
if switching centres can be located anywhere in the Euclidean plane, suggest one alternative telephone network that would cost less.
discuss some shortcomings of the telephone networks found in (1) and (2) above.

Solution

The situation is as shown below.

We have to find the shortest spanning tree (SST) so we use the Kruskal algorithm. Starting with the shortest link we have:

Link  Length    Decision
CD    10        add to tree
DF    20        add to tree
AG    20        add to tree
CF    (500)^0.5   reject as forms circuit DFCD
FH    (800)^0.5   add to tree
AC    30        add to tree
BE    40        add to tree
AD    40        reject as forms circuit ADCA
AE    (1700)^0.5  add to tree

Stop as 7 links are all we need for the SST of a graph with 8 nodes.

The SST is shown below. Note here that we could have used GB instead of AE above.

The total cost of the SST is

50[10 + 20 + 20 + (800)^0.5 + 30 + 40 + (1700)^0.5] = £9475.77

If we are looking to add additional (Steiner) vertices one possibility for adding such vertices is shown below.

Here we have added two Steiner vertices and note the 120 degrees between the three links meeting at a Steiner vertex. The cost of the tree shown above is given by

50[10 + 20 + (800)^0.5 + 30 + 2(11.5) + 22.7 + 2(23.1)]

=£9009.21 (approx)

The shortcomings of the above telephone networks are principally:

getting the lines to go where we want (e.g. unlikely to get telecom lines to go in a straight line)
network very vulnerable to a failure at a node or in a link
cost of additional switching centres as against savings in line cost needs to be considered for the case of the additional (Steiner) vertices.

Graph theory example

For the graph shown below calculate the shortest spanning tree (SST) of the graph.

Solution

We use Kruskal's algorithm which is: "for a graph with n nodes keep adding the shortest (least cost) link - avoiding the creation of circuits - until (n-1) links have been added".

Hence we consider the links in ascending length order to get:

Link  Length  Decision
1-3   1       add to tree
2-3   2       add to tree
2-5   2       add to tree
3-5   3       reject as forms circuit (3,5,2)
4-5   3       add to tree
5-6   3       add to tree
3-4   4       reject as forms circuit (3,4,5,2)
5-7   4       add to tree

Stop as 6 links have been added and these are all we need for a graph with 7 nodes

This gives us the SST shown below of total length 15.