Dependency step order using Graphs

In this article we’ll analyze how to efficiently generate a dependency tree from a list of isolated dependencies, optimizing the number of parallel steps that can be executed.

This is a very recurrent computer science problem, appearing on package managers (like npm or Cargo), for example. This kind of optimization can also help ordering supply chain tasks where some steps depend on the completion of other steps.

For our development environment, we’re gonna use Haskell, and the library fgl. To create a new project using Stack, we can run the following command:

stack new graph-dependency

This will create a new empty project. Next we need to import the library. For that, we need to edit file package.yaml to have the following library section:

library:
 source-dirs: src
 dependencies:
 - fgl

The fgl library allows us to easily manipulate Graphs. Graphs are ubiquitous structures that help model real world problems, like social networks relationships, or supply chain steps involved in producing a given good.

We can create a simple data structure to represent the dependencies across steps:

import Data.Graph.Inductive

noEdges :: [UEdge]
noEdges = []

simpleGraph :: Gr Char ()
simpleGraph = mkGraph (zip [1..3] "abc") noEdges

calculateDependencyTree :: IO ()
calculateDependencyTree = print simpleGraph

The result is this simple Graph, with tree vertexes and no edge:

mkGraph [(1,'a'),(2,'b'),(3,'c')] []

Then we create simple edges representing 1 depending on 2 and 2 depending on 3:

labUEdges :: [Edge] -> [UEdge]
labUEdges = map (\(i,j) -> (i,j,()))

simpleGraph :: Gr Char ()
simpleGraph = mkGraph (zip [1..3] "abc") (labUEdges [(1,2), (2,3)])

mkGraph [(1,'a'),(2,'b'),(3,'c')] [(1,2,()),(2,3,())]

Next we create a function that returns one valid step execution

calculateValidStepExecution :: DynGraph gr => gr Char () -> [[Node]]
calculateValidStepExecution graph = filter (not. null) $ ufold (\(_, node, _, outgoingEdges) _ ->
    let outgoingNodes = map snd outgoingEdges
        previousSteps = if null outgoingNodes then [] else outgoingNodes
    in calculateValidStepExecution (nfilter (`notElem` (node : outgoingNodes)) graph) ++ [previousSteps] ++ [[node]]
    ) [] graph

This function calls ufold to reduce the Graph into a list of list of Int, representing the following structure:

[
	[ 1, 2, 3 ], -- Steps that can be executed in parallel first
	[ 4, 5, 6 ], -- Steps that can be executed in parallel next
	...
	[ 98, 99, 100 ], -- Last steps to run
]

So we use the ufold function to reduce the Graph to a single value. It receives a function as the first parameter, the initial value as the second parameter, and the Graph as the last parameter. The expected function for the first parameter receives 4 parameters (incoming nodes, currentNode, currentNodeLabel, outgoingNodes), and we ignore the first and the third. We then proceed to return the recursive call to calculateValidStepExecution passing as parameter the Graph minus the current node and the current outgoing node and appending it to the outgoing nodes and the current node.

calculateValidStepExecution simpleGraph

> [[3],[2],[1]]

We see that the algorithm is working. Iterating over it, we can see that, for each node, we remove it from the Graph, take the outbound nodes as the previous steps that can run in parallel, and call the function over again, now filtering both current node and the other noted as previous steps:

notSoSimpleGraph :: Gr Char ()

notSoSimpleGraph = mkGraph (zip [1..5] "abcde") (labUEdges [(1,2), (1,3), (2,4), (2,5), (4,5)])

calculateValidStepExecution notSoSimpleGraph

> [[5],[4],[2,3],[1]]

We can see that the function was capable of finding the optimal execution order for the scenario.

One thing to note is that the algorithm only works if the nodes are in strictly crescent order. To show the issue, we can run the function for a reversed order Graph:

reverseSimpleGraph :: Gr Char ()

reverseSimpleGraph = mkGraph (zip [1..3] "abc") (labUEdges [(3,2), (2,1)])

calculateValidStepExecution reverseSimpleGraph

> [[3],[2],[1]]

Sources:

Chris Okasaki and Andy Gill, “Fast Mergeable Integer Maps“, Workshop on ML, September 1998 https://web.archive.org/web/20150417234429/https://ittc.ku.edu/~andygill/papers/IntMap98.pdf

D.R. Morrison, “PATRICIA — Practical Algorithm To Retrieve Information Coded In Alphanumeric“, Journal of the ACM, 15(4), October 1968, pages 514-534 https://dl.acm.org/doi/pdf/10.1145/321479.321481

Martinn Erwig, “Inductive Graphs and Functional Graph Algorithms”, Journal of Functional Programming, Vol. 11, No. 5, 467-492, 2001
https://web.engr.oregonstate.edu/~erwig/papers/InductiveGraphs_JFP01.pdf

Sample code:

https://github.com/nikalumoglich/graph_dependency

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *