Kosaraju’s SSC algorithm is used to find the strongly connected components(SSC) in a directed graph. What are SSC? An intuitive definition is that SSC’s are parts of the graph where you can move from one node to any other node in that part. Kosaraju’s devious algorithm works by using **two passes of DFS.**

**In the first pass**, the directed graph is ** reversed **, we also label the nodes form n to 1 and traverse the graph from n to 1, for each node we invoke a Depth First Search[i.e DFS()] on that node and we keep track of the finishing time of each node via a global variable. This will be more clear with the example given below:

To give you an **example of the first pass**, lets say the nodes n, n-1 and n-2 are strongly connected( i.e that have a directed cycle that allows us to traverse from one node to any other node). Then in the first pass DFS() would be called first on node n, which would automatically cover node n-1 and node n-2 due to DFS’s property of being able to cover all reachable nodes from the source vertex, in this case ‘n’. After the DFS() terminates, we go back to the outer loop of the first pass and the continue to invoke DFS from node (n-3). So at the end of the first pass, the reversed graph would been completely traversed and we obtain the finishing time of each node, which we will use in the second pass.

**In the second pass,** we traverse the original graph(not reversed) again, this time we will traverse from the highest finishing time to the lowest, for each finishing time we invoke DFS() again on each node. In the second pass we also keep track of ‘leaders’. Each node is assigned a leader, the leader in turn is a global variable that is assigned to a node before we invoke DFS() on it. This will be more clear with the example given below:

To give you an **example of the second pass**, let’s say we are iterating on a graph with for which we have already computed the finishing times in the first pass. Let’s say this graph has 8 nodes. So we iterate from the node having the highest finishing time( in this case the finishing time will be 8) and the leader is this node. When DFS() is invoked on this leader, all nodes that are reachable from the leader get it’s leader equal to the current leader(node with finishing time of 8). At the end of the second pass, all nodes having the same leader are strongly connected components.

Question:

The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the 11th row looks liks : “2 47646”. This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646

Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.

Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100, then your answer should be “500,400,300,200,100” (without the quotes). If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be “400,300,100,0,0” (without the quotes). (Note also that your answer should not have any spaces in it.)

Example1:

Input file is:

1 4 2 8 3 6 4 7 5 2 6 9 7 1 8 5 8 6 9 7 9 3

Output:

Leader of Node 1 is: 1 Leader of Node 4 is: 1 Leader of Node 7 is: 1 Leader of Node 9 is: 9 Leader of Node 3 is: 9 Leader of Node 6 is: 9 Leader of Node 8 is: 8 Leader of Node 5 is: 8 Leader of Node 2 is: 8 The largest five SSC are: [3, 3, 3, 0, 0]

Click here for Github link to below code

Main.java:

package stronglyConnectedComponents; import java.io.BufferedReader; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; public class Main { public static void main(String[] args) { //String inputPath = args[0]; String inputPath = "/home/ninad/workspace/java_personal/resources/ssc_testcase1.txt"; Path path = Paths.get(inputPath); Graph graph = new Graph(); Graph reverseGraph = new Graph(); try { BufferedReader reader = Files.newBufferedReader(path); String line = null; while((line = reader.readLine()) != null){ if (line.isEmpty() || line.trim().equals("") || line.trim().equals("\n")) { continue; } String[] edge = line.split("\\s"); Integer fromVertex = Integer.parseInt(edge[0]); Integer toVertex = Integer.parseInt(edge[1]); graph.addEdge(fromVertex,toVertex); reverseGraph.addEdge(toVertex, fromVertex); } } catch (IOException e) { e.printStackTrace(); } //System.out.println("Graph is: \n" + graph.toString()); //DFS.depthFirstSearch(graph,new Node(1)); DFS.dfsTwoPass(graph,reverseGraph); System.out.println("The largest five SSC are: " + Arrays.toString(DFS.sscLargestFive)); //System.out.println("processed graph is: " + graph.toString()); } }

Data Structures:

Node.java

package stronglyConnectedComponents; public class Node { Integer nodeId; Node leader; int finishingTime; public Node(int nodeId) { this.nodeId = nodeId; } public Node() { } @Override public String toString() { return nodeId.toString(); } @Override public boolean equals(Object other) { if (!(other instanceof Node)) { return false; } Node otherNode = (Node) other; return this.nodeId.equals(otherNode.nodeId); } @Override public int hashCode() { int hashCode = 1; hashCode = hashCode * 37 + this.nodeId.hashCode(); return hashCode; } public Node getLeader() { return leader; } public void setLeader(Node leader) { this.leader = leader; } public int getFinishingTime() { return finishingTime; } public void setFinishingTime(int finishingTime) { this.finishingTime = finishingTime; } public Integer getNodeId() { return nodeId; } public void setNodeId(Integer nodeId) { this.nodeId = nodeId; } }

Graph.java

package stronglyConnectedComponents; import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.Map; import java.util.Map.Entry; import java.util.Set; public class Graph { public HashMap&amp;lt;Node, ArrayList&amp;gt; edgeMap = new HashMap&amp;lt;Node, ArrayList&amp;gt;(); public HashMap&amp;lt;Node, Boolean&amp;gt; isExploredMap = new HashMap&amp;lt;Node, Boolean&amp;gt;(); // nodeList is not strictly necessary, but I have added it for better code // readability during dfs_loop() public Set nodeList = new HashSet(); public void addEdge(Integer fromVertex, Integer toVertex) { Node fromNode = new Node(fromVertex); Node toNode = new Node(toVertex); addEdge(fromNode, toNode); } public void addEdge(Node fromNode, Node toNode) { if (this.edgeMap.containsKey(fromNode)) { ArrayList valueNodes = this.edgeMap.get(fromNode); valueNodes.add(toNode); this.edgeMap.put(fromNode, valueNodes); isExploredMap.put(toNode, false); nodeList.add(toNode); } else { ArrayList valueNode = new ArrayList(); valueNode.add(toNode); this.edgeMap.put(fromNode, valueNode); // add to isExploredMap isExploredMap.put(fromNode, false); isExploredMap.put(toNode, false); // Add nodes to list nodeList nodeList.add(fromNode); nodeList.add(toNode); } } public void resetToUnexplored() { Iterator&amp;lt;Map.Entry&amp;lt;Node, Boolean&amp;gt;&amp;gt; iter = isExploredMap.entrySet().iterator(); while (iter.hasNext()) { iter.next().setValue(false); } } @Override public String toString() { StringBuilder result = new StringBuilder(); for (Entry&amp;lt;Node, ArrayList&amp;gt; entry : edgeMap.entrySet()) { String edge = new String(entry.getKey().toString() + "=&amp;gt;" + entry.getValue().toString() + "\n"); Node leader = entry.getKey().leader; result.append(edge); if (leader != null) { String printLeader = " Leader is (" + leader.toString() + ") \n"; result.append(printLeader); } } return result.toString(); } }

The DFS class implements Kosaraju’s SSC algorithm:

DFS.java

package stronglyConnectedComponents; import java.util.ArrayList; public class DFS { public static int finishingTimeGlobal; public static Node leaderGlobal; public static ArrayList sortedWithFinishingTime = new ArrayList(); public static int[] sscLargestFive = new int[5]; static int ssc = 0; // public static ArrayList sscList = new ArrayList(); public static void depthFirstSearch(Graph graph, Node v) { dfs(graph, v); } public static void dfsTwoPass(Graph graph, Graph revGraph) { // First Pass finishingTimeGlobal = 0; leaderGlobal = null; graph.resetToUnexplored(); for (Node node : graph.nodeList) { dfsLoop(revGraph, node, true); } // Now we got all the finishing times in the first pass. Do a second dfs // pass with // decreasing order of finishing times graph.resetToUnexplored(); for (int j = sortedWithFinishingTime.size() - 1; j &amp;gt;= 0; j--) { Node n = sortedWithFinishingTime.get(j); dfsLoop(graph, n, false); } addIfLarger(ssc); } private static void addIfLarger(int currentSSC) { // addIfLarger will add currentSSC if it belongs to the highest five // SSC's // First we find the smallest number in the top five SSC sscLargestFive // list int smallest = sscLargestFive[0]; int pos = 0; for (int i = 1; i &amp;lt; sscLargestFive.length; i++) { if (smallest &amp;gt; sscLargestFive[i]) { smallest = sscLargestFive[i]; pos = i; } } // if the current ssc count is larger than the smallest number in top // five, then replace that number with currentSSC // sscLargestFive if (currentSSC &amp;gt; smallest) { sscLargestFive[pos] = currentSSC; // sscList.add(currentSSC); } } public static void dfsLoop(Graph graph, Node node, boolean isFirstPass) { Boolean isExplored = graph.isExploredMap.get(node); // if node is not yet explored if (!isExplored) { if (!isFirstPass) { // add ssc count to sscList and reset ssc. Then iterate over // sscList to get top 5 ssc addIfLarger(ssc); ssc = 0; } leaderGlobal = node; modifiedDfs(graph, node, isFirstPass); } } private static void modifiedDfs(Graph graph, Node v, boolean isFirstPass) { // mark v as explored if (graph.isExploredMap.get(v) == false) { // System.out.println("Visited Node: " + v.toString()); // mark v as explored graph.isExploredMap.put(v, true); // in second pass we have to keep track of the leader if (!isFirstPass) { // set leader(v) = leader v.leader = leaderGlobal; ssc++; System.out.println("Leader of Node " + v.toString() + " is: " + v.leader.nodeId); } } // for each edge(v,w), if w is unexplored call dfs on w ArrayList adjNodes = graph.edgeMap.get(v); if (adjNodes != null) { for (Node n : adjNodes) { if (graph.isExploredMap.get(n) == false) { modifiedDfs(graph, n, isFirstPass); } } } finishingTimeGlobal++; v.finishingTime = finishingTimeGlobal; // in the first pass we have to keep track of the finishing time if (isFirstPass) { sortedWithFinishingTime.add(v); } } private static void dfs(Graph graph, Node v) { // mark v as explored if (graph.isExploredMap.get(v) == false) { System.out.println("Visited Node: " + v.toString()); // mark v as explored graph.isExploredMap.put(v, true); } // for each edge(v,w), if w is unexplored call dfs on w ArrayList adjNodes = graph.edgeMap.get(v); if (adjNodes != null) { for (Node n : adjNodes) { if (graph.isExploredMap.get(n) == false) { dfs(graph, n); } } } } }

There are many ways to optimize this algorithm for instance, not using another graph data structure for reversing the graph, implementing DFS iteratively to avoid the Stack Overflow error you might get when using recursion(If you still prefer to use recursion like me, make sure to increase the -Xss1m VM argument to a higher number when executing over very large graphs. I’m out of time and for now I will be content with just a working implementation of Kosaraju’s algorithm.

~Ninad