Recently I received an email from a guy named Alexandr who reads the blog. He send me a small piece of C# code with the statement that my Greatest Common Denominator (GCD) algorithm was not as efficient as it could be. As a side note this algorithm is also known as greatest common factor GCF and highest common factor HCF. To my defense I would like to state that I do believe that even my implementation of the algorithm is sufficiently good to solve most problems and it is a really good enough.

Being the geek that we all are it peeked my interest and I wanted to look a bit into it and test which one was faster as well as open the question for you to give your favorite GCD algorithm.

My algorithm relies on Euclid’s algorithm which is described on wikipedia and the implementation looks as

private static long gcd(long a, long b) { long y, x; if (a > b) { x = a; y = b; } else { x = b; y = a; } while (x % y != 0) { long temp = x; x = y; y = temp % x; } return y; }

So my no means a short piece of code. The code Alexandr sent me looks the following

private static long gcd2(long x, long y) { while (x * y != 0) { if (x >= y) x = x % y; else y = y % x; } return (x + y); }

So if nothing else, it is definitively smaller in terms of lines of code and therefore I liked it. Also it avoids the temporary variable to interchange x and y. The email provided me with Alexandr’s test results which showed that on his computer the result of running multiple times gave us the following results

00:00:04.5850623 00:00:03.4624319

with my GCD algorithm being the upper one which is 33% slower than the one he proposed. So that is indeed pretty nice. My own results show a similar result for running it the given number of times, but increasing the number of times lowers the gap, and at running it for 1-10.000 for both x and y reduced the difference to 14%. But still a good improvement. I know for sure that it has been changed in my library.

I know there are more efficient methods out there, so my question to you is how do you calculate the GCD?

I have already gotten the first response for this post which was a mail from Ram giving me a GCD algorithm using recursion. It looks like

According to my quick testing it is faster than two algorithms in the article. So nicely done with yet another implementation build upon Euclid’s algorithm.

Wow. That’s pretty cool. I always thought I had a pretty fast GCD function:

But after some tests, mine seems to be the slowest, and Alexandr’s the fastest.

I also tried improving his function and managed to make it a tiny bit faster:

Another improvement is to use int instead of long when you can.

Faster:

Even faster in C++:

Maybe there is an even faster way through prime factors of the numbers (I doubt it, but maybe for big numbers or some special cases).

I did some Google-ing and found this paper which might use some other method.

The C++ version above should have:

instead of

Shortest GCD:

Also, according to my tests, Alexandr’s code is faster than Ram’s code.

Wow, this post has really surprised me. I got an email with a faster version of GCD than what I used and with a bit of tweaking by others who have seen the code, we are now clocking in a 40% lower running time according to my test. That is really impressive work by you guys.

http://en.wikipedia.org/wiki/Binary_GCD_algorithm

Do we have a winner?

According to Wikipedia, due to it’s increased complexity combined with the dedicated hardware for integer division, the gain should be rather small. But I guess the best way is to implement it and see, At least it sounds like a good candidate.

The one I had been using is

but Ram’s, with a few additional optimizations seems better

Have you found even faster ones since making this post? I tried the binary GCD linked here earlier, but it was a fair amount slower than these ones based on mod.

No I haven’t looked into speeding it up more. It never seems to the GCD part that is taking the time.

Hi!

gcd6 has a typo: x & y should be x && y, otherwise gcd(11,4) returns 15, which is obviously wrong (damn you, binary and!)

In my tests, compiling with -O3, Ram’s was the fastest by about 1.5%.

I added the test into Alexandr’s as well as changing the loop from a while to a do while (since x and y shouldn’t be null) and now it’s marginally faster (<1%) than Ram's. Here's how it looks:

Overall this is about a 10% improvement on what I used before.

private static int gcd(int u, int v)

{

for(; ; )

{

u %= v;

if (u == 0) return v;

v %= u;

if (v == 0) return u;

}

}

// I think the first mod operation with u<v is fast, didn'test it.

// Compare to zero is fast.

// Faster: Knuth's TAOCP 4.5.2

Hello! I’ve been visiting your blog very often (usually when I get stuck on projecteuler) and I must say, this website is very well done!

I’m not good at mathematical algorithms (GCD algorithms, in this case) but I do have some comments to make about the implementation of the GCD algorithms in your post as well as in the replies to your post.

Firstly, recursive methods crunch up a huge amount of memory space. This is especially so if you were to pass arguments to it recursively by value, because whenever it is called recursively, the objects are copied (because of it being passed by value). Of course, this doesn’t matter much for small numbers.

Secondly, if you were to be optimizing your algorithms, I would suggest that you write a native library instead, and create a managed wrapper around this native library (perhaps using C++ CLI). The last time I wrote a matrix library on both native C++ and managed C#, the native library performed about 25 times faster in terms of repeated 10×10 matrix multiplication, even after marshalling.

Hi Lee

Thank you. I am glad you find it enlightening. I encourage people to solve it on their own, and once people have solved it read the writeup I have done. I also know how annoying it is to get stuck on something though.

Regarding the recursiveness. Yes it does have a larger memory footprint, I am not sure how smart compilers are regarding that today.

Regarding the native implementation I fully agree with you, but for such small computations as gcd I doubt that you will gain much.

However, I would love to see the differences you point out implemented and tested.

Hi Kristian,

I’ve written up a simple implementation of a managed wrapper around native methods that does GCD computations. It seems that the resultant performance was around twice as fast as managed methods.

You can check it out here! : http://codemyroad.wordpress.com/2013/06/20/native-code-vs-managed-code-performance-and-development/

Thanks for the write-up it is an interesting read. And it does make sense. Both you comment about speed, but also the amount of code. So like almost everything else, it is a trade-off between speed or ease of development and maintenance.

I was playing around with this at work today, and found your blog. My current best is:

The binary gcd_odd has the advantage that it works with unsigned numbers and cannot overflow. The plus/minus one is effectively limited to 62-bit numbers, but it's ever so slightly faster. Also, cooler.

Looks really neat, Mike. I also like the way you use __builtin_ctz.

This post is rated quite high for some keywords in google.

Therefore, I want to add a note about gcd2. gcd2 does not return the same results as gcd for all inputs because the multiplication will overflow quite easily without any hints for the caller.

Example (continued to use Java):

System.out.println(gcd(335007449088L, 83751862272L));

System.out.println(gcd2(335007449088L, 83751862272L));

Result:

83751862272 //=2^31*3*13, correct

418759311360 //=2^31*3*5*13, incorrect

Possible solutions:

-change the multiplication to 2 times compare to 0

-check for overflow integer arithmetic (Math.multiplyExact since Java8)

-rewrite to some other form like the one from PP:

I speeded up my version of gcd by about 30% by changing from Euclid’s algorith to Stein’s:

Concerning my previous comment:

I have found an error in the code at line 27.

It should be:

unsigned long long int t = v;