Recently I received an email from a guy named Alexandr who reads the blog. He send me a small piece of C# code with the statement that my Greatest Common Denominator (GCD) algorithm was not as efficient as it could be. As a side note this algorithm is also known as greatest common factor GCF and highest common factor HCF. To my defense I would like to state that I do believe that even my implementation of the algorithm is sufficiently good to solve most problems and it is a really good enough.
Being the geek that we all are it peeked my interest and I wanted to look a bit into it and test which one was faster as well as open the question for you to give your favorite GCD algorithm.
My algorithm relies on Euclid’s algorithm which is described on wikipedia and the implementation looks as
private static long gcd(long a, long b) {
long y, x;
if (a > b) {
x = a;
y = b;
} else {
x = b;
y = a;
}
while (x % y != 0) {
long temp = x;
x = y;
y = temp % x;
}
return y;
}
So my no means a short piece of code. The code Alexandr sent me looks the following
private static long gcd2(long x, long y) {
while (x * y != 0) {
if (x >= y) x = x % y;
else y = y % x;
}
return (x + y);
}
So if nothing else, it is definitively smaller in terms of lines of code and therefore I liked it. Also it avoids the temporary variable to interchange x and y. The email provided me with Alexandr’s test results which showed that on his computer the result of running multiple times gave us the following results
00:00:04.5850623 00:00:03.4624319
with my GCD algorithm being the upper one which is 33% slower than the one he proposed. So that is indeed pretty nice. My own results show a similar result for running it the given number of times, but increasing the number of times lowers the gap, and at running it for 1-10.000 for both x and y reduced the difference to 14%. But still a good improvement. I know for sure that it has been changed in my library.
I know there are more efficient methods out there, so my question to you is how do you calculate the GCD?





I have already gotten the first response for this post which was a mail from Ram giving me a GCD algorithm using recursion. It looks like
private static long gcd(long a, long b) { long r; r = a % b; if (r == 0) return b; else if (r == 1) return 1; else return gcd3(b, r); }According to my quick testing it is faster than two algorithms in the article. So nicely done with yet another implementation build upon Euclid’s algorithm.
Wow. That’s pretty cool. I always thought I had a pretty fast GCD function:
long gcd3(long x, long y) { while (y != 0) { x %= y; x ^= y; y ^= x; x ^= y; } return x; }But after some tests, mine seems to be the slowest, and Alexandr’s the fastest.
I also tried improving his function and managed to make it a tiny bit faster:
long gcd4(long x, long y) { while (x != 0 && y != 0) { // changed this line if (x >= y) x = x % y; else y = y % x; } return (x + y); }Another improvement is to use int instead of long when you can.
Faster:
long gcd6(long x, long y) { while ((x & y) != 0) { if (x >= y) x %= y; else y %= x; } return x + y; }Even faster in C++:
long long gcd6(long long x, long long y) { while ((x & y) != 0) { if (x >= y) x %= y; else y %= x; } return x + y; }Maybe there is an even faster way through prime factors of the numbers (I doubt it, but maybe for big numbers or some special cases).
I did some Google-ing and found this paper which might use some other method.
The C++ version above should have:
instead of
Shortest GCD:
int gcd8(int a, int b) { return b == 0 ? a : gcd8(b, a % b); }Also, according to my tests, Alexandr’s code is faster than Ram’s code.
Wow, this post has really surprised me. I got an email with a faster version of GCD than what I used and with a bit of tweaking by others who have seen the code, we are now clocking in a 40% lower running time according to my test. That is really impressive work by you guys.
http://en.wikipedia.org/wiki/Binary_GCD_algorithm
Do we have a winner?
According to Wikipedia, due to it’s increased complexity combined with the dedicated hardware for integer division, the gain should be rather small. But I guess the best way is to implement it and see, At least it sounds like a good candidate.
The one I had been using is
static long Gcd(long num1, long num2) { if (num1 > num2) { num1 %= num2; if (num1 == 0) return num2; } do { num2 %= num1; if (num2 == 0) return num1; num1 %= num2; if (num1 == 0) return num2; } while (true); }but Ram’s, with a few additional optimizations seems better
static long Gcd4(long a, long b) { long r; if (b > a) { r = b % a; if (r == 0L) return a; if (r == 1L) return 1L; b = r; } while (true) { r = a % b; if (r == 0L) return b; if (r == 1L) return 1L; a = b; b = r; } }Have you found even faster ones since making this post? I tried the binary GCD linked here earlier, but it was a fair amount slower than these ones based on mod.
No I haven’t looked into speeding it up more. It never seems to the GCD part that is taking the time.
Hi!
gcd6 has a typo: x & y should be x && y, otherwise gcd(11,4) returns 15, which is obviously wrong (damn you, binary and!)
In my tests, compiling with -O3, Ram’s was the fastest by about 1.5%.
I added the test into Alexandr’s as well as changing the loop from a while to a do while (since x and y shouldn’t be null) and now it’s marginally faster (<1%) than Ram's. Here's how it looks:
long long gcd(long long x, long long y) { do { if (x >= y){ x %= y; } else { y %= x; } if(y==1 || x==1) return 1; }while (x && y); return x + y; }Overall this is about a 10% improvement on what I used before.