I’m going to introduce you to a simple, but very effective, data structure called the Disjoint-set data structure (or the Union-Find data structure, but Union and Find are its main operations). We are going to use it in the upcoming Project Euler 107, and it might come in handy in future Project Euler or UVa Online Judge problems.

The Disjoint-set data structure is useful when we have a collection of items that are divided into one or more pairwise disjoint sets (hence the name of the data structure), meaning that no item is in more than one set. An example of this are the connected components of an undirected graph, just like the one represented in the image below:

Here we have three connected components, or equivalently, three disjoint sets. I’ve colored them blue, yellow and red.

The Disjoint-set data structure allows us to very quickly determine if two items are in the same set (or equivalently determine if two vertices are in the same connected component), and also to very quickly unite two sets (or equivalently combine two connected components into one connected component).

In its simplest form, the data structure is just an array of integers, called . If we are dealing with items, then is an array of size , and the th element of the array represents the th item. More precisely, the th element of the array is the *parent* of the th item. These relationships create one, or more, virtual trees. Study the following image to better understand it.

There you can see that is under in the tree, because . Also note that and have no parents, because and .

Each tree is a disjoint set. If two elements are in the same tree, then they are in the same disjoint set. The root node (or the topmost node) of each tree is called the representative of the set. There is always a single unique representative of each set. As you can see, if is the representative of a set, then . If is not the representative of his set, then it can be found by traveling up the tree until we find the representative. This is the **Find** operation of the Disjoint-set data structure, and can be implemented with the following recursive code (in C#).

// Finds the representative of the set that i is an element of public int Find(int i) { // If i is the parent of itself if (Parent[i] == i) { // Then i is the representative of his set return i; } else { // Else if i is not the parent of itself // Then i is not the representative of his set, // so we recursively call Find on its parent return Find(Parent[i]); } }

Now that we’ve implemented the **Find** operation, the **Union** operation is simple to implement. It takes, as input, two elements, finds the representatives of their sets using the **Find** operation, and finally puts either one of the trees (representing the set) under the root node of the other tree, effectively merging the trees and the sets. If we would call **Union** on the two trees in the image example above, the result would be the following:

One thing you should notice, is that it doesn’t matter which tree goes under the other (we’ll exploit this fact for more efficiency later). So this could also have been the result:

I’ll leave it as an exercise for you to create the corresponding array. But for the **Union** operation, here is an example implementation (in C#):

// Unites the set that includes i and the set that includes j public void Union(int i, int j) { // Find the representatives (or the root nodes) for the set that includes i int irep = this.Find(i), // And do the same for the set that includes j jrep = this.Find(j); // Make the parent of i's representative be j's representative // (effectively moving all of i's set into j's set) this.Parent[irep] = jrep; }

We can then wrap the **Find** and **Union** operations into a *DisjointSet* class for convenience, and maybe add a few things.

// The Disjoint-set data structure public class DisjointSet { // The number of elements in the collection. public int Count { get; private set; } // The parent of each element in the collection. private int[] Parent; // Initializes a new Disjoint-Set data structure, with the specified amount of elements in the collection. public DisjointSet(int count) { this.Count = count; this.Parent = new int[this.Count]; // Initially, all elements are in their own set. for (int i = 0; i < this.Count; i++) { this.Parent[i] = i; } } // And here come the Union and Find operations. }

This is enough to get a functional Disjoint-set data structure, but there are still a couple of things we can do to greatly improve the efficiency of the data structure. The efficiency depends heavily on the height of the tree. Let’s have a look at two methods to minimize the height of tree in order improve the efficiency.

## Path Compression

Our first improvement is very simple, but also very effective. It’s called Path Compression, and speeds up the data structure by compressing the height of the trees on the fly. That can be achieved by inserting a small caching mechanism into the **Find** operation. Take a look at the code for more details:

// Finds the representative of the set that i is an element of public int Find(int i) { // If i is the parent of itself if (Parent[i] == i) { // Then i is the representative of his set return i; } else { // Else if i is not the parent of itself // Then i is not the representative of his set, // so we recursively call Find on it's parent, and save it in our result variable int result = Find(Parent[i]); // We then cache the result by moving i's node directly under the representative of his set Parent[i] = result; // And then we return the result return result; } }

Take a look at this example:

In both cases, I’ve called the Find operation on item number 2. Without path compression, nothing is changed in the tree. But with path compression, the items on the path from 2 and up to the root node are all modified to connect directly to the root.

## Union by Rank

Our previous improvement changed the **Find** operation to make the heights of the trees smaller. This improvement, called Union by Rank, changes the **Union** operation to also make the heights of the trees smaller. First of all, we need a new array of integers called of the same size as the array. Then, if is a representative of a set, is the height of the tree representing the set. Let’s add those changes to the *DisjointSet* class:

// The rank of each element in the collection. private int[] Rank; // Initializes a new Disjoint-Set data structure, with the specified amount of elements in the collection. public DisjointSet(int count) { this.Count = count; this.Parent = new int[this.Count]; this.Rank = new int[this.Count]; // Initially, all elements are in their own set. for (int i = 0; i < this.Count; i++) { this.Parent[i] = i; this.Rank[i] = 0; } }

Now recall that, in the **Union** operation, it doesn’t matter which of the two trees is moved under the other (see last two image examples above). Now what we want to do is minimize the height of the resulting tree. If we are uniting two trees (or sets), let’s call them left and right, then it all depends on the rank of left and the rank of right. If the rank of left is less than the rank of right, then it’s best to move left under right, because that won’t change affect the rank of right (while moving right under left would). In the same way, if the rank of right is less than the rank of left, then we should move right under left. If the ranks are equal, it doesn’t matter which tree goes under the other, but the rank of the result will always be one greater than the rank of the trees.

Let’s modify our **Union** operation with these improvements:

// Unites the set that includes i and the set that includes j public void Union(int i, int j) { // Find the representatives (or the root nodes) for the set that includes i int irep = this.Find(i), // And do the same for the set that includes j jrep = this.Find(j), // Get the rank of i's tree irank = Rank[irep], // Get the rank of j's tree jrank = Rank[jrep]; // Elements are in the same set, no need to unite anything. if (irep == jrep) return; // If i's rank is less than j's rank if (irank < jrank) { // Then move i under j this.Parent[irep] = jrep; } // Else if j's rank is less than i's rank else if (jrank < irank) { // Then move j under i this.Parent[jrep] = irep; } // Else if their ranks are the same else { // Then move i under j (doesn't matter which one goes where) this.Parent[irep] = jrep; // And increment the the result tree's rank by 1 Rank[jrep]++; } }

## Conclusion

Now we’ve got ourselves a great Disjoint-set data structure, ready to take whatever you throw at it. You should definitely try it out in a couple of problems, for example:

UVa 10608 – Friends

UVa 793 – Network Connections

UVa 10583 – Ubiquitous Religions

UVa 459 – Graph Connectivity

You should also stay tuned for our upcoming coverage of Project Euler 107, where we’ll use the data structure as a building block for Kruskal’s algorithm.

You can find the complete source code for our Disjoint-set data structure (with additional methods and properties) here, or you can find it in our new MathBlog code library.

So what do you think about the Disjoint-set data structure? Have used it before? Leave a comment and let us know.

under union by rank ..shoulnt it be :-

else {

// Then move i under j (doesn’t matter which one goes where)

this.Parent[irep] = jrep;

// And increment the the result tree’s rank by 1

Rank[jrep]++; // jrep instead of irep.

}

because Parent[irep] parent node is jrep.

so rank of jrep should be incr because it is a representative of set.

correct me if i am wrong.

You’re completely right, atul. Thanks for noticing!

I have one request.

Sorry for moving away from this topic.

As you are posting tutorial on implementing different type of data structure … if possible could you please publish similar article in implementing Suffix tree. I find it difficult to implement 🙁

I am not able to find any good tutorial on suffix tree on net..esp implementation.

Thank in advance.

Thanks for the suggestion, I’ll see what I can do.

Hi, What about the delete operation? How should it be implemented?

The Union-Find data structure doesn’t define a delete operation. But you can of course try to define how you would want the delete operation to work, and then go ahead and implement it if possible.

For example, say you wanted the delete operation to remove an item [latex]x[/latex] from the set it’s currently in, and move it into it’s own set. Then you could change the parent of every element who has [latex]x[/latex] as a parent to the parent of [latex]x[/latex]. Then you set the parent of [latex]x[/latex] to be [latex]x[/latex]. The only edge case here is when [latex]x[/latex] was the representative of the set, in which case you’d need to make some other element in the same set the parent before executing the above.

Awesome tutorial.. just what I was looking for.

A nice C# implementation.

http://onestopinterviewprep.blogspot.com/2014/03/union-find-datastructure.html

Hi, I need to clear some of concepts in disjoint sets.

How SET can be represented with graph trees?

Why we need to represent disjoint sets with arrays and linked lists?

Can you help me in clearing the above?

Hi

Very impressive explanation.

Thanks a lot. I was lokking fro a gud step by step implementaion for the same.you made it easy to understand.

Br

Kmspace

Awesome explanation.

Both path compression & union by rank cannot be applied together. Right?

For this to work, does the graph have to be acyclic? Otherwise, there can be a component that is a cycle – without a parent.

Very nice explanation. No need of anything else. Other comments are way back in time but i will surely type this comment to thank the author.

LateX bits are broken due to CORS 🙁

Hi,

I’d like to point something out that another comment also asks about but you didn’t respond to:

your path compression during the Find function can sometimes cause the rank of the set to change. This combined with union by rank means that the Union function might work with wrong rank values. So can both optimizations not be used together or is there a way to easily update the rank information during the find optimization? I can’t think of any right now.

I think that it’s not an issue with the path compression and union by rank and just need some adjusting steps.

After you did the path compression the rank of all elements of the subtree will be equal to 1.

So, you can assign the 1 value while doing the Find operation when you use Find With Path Compression.

The Union function will be correct anyway. Plus, if you will use the Union function without using The Find function before, it will work correctly also.