 # GCD faceoff

Recently I received an email from a guy named Alexandr who reads the blog. He send me a small piece of C# code with the statement that my Greatest Common Denominator (GCD) algorithm was not as efficient as it could be. As a side note this algorithm is  also known as greatest common factor GCF and highest common factor HCF. To my defense I would like to state that I do believe that even my implementation of the algorithm is sufficiently good to solve most problems and it is a really good enough.

Being the geek that we all are it peeked my interest and I wanted to look a bit into it and test which one was faster as well as open the question for you to give your favorite GCD algorithm.

My algorithm relies on Euclid’s algorithm which is described on wikipedia and the implementation looks as

```private static long gcd(long a, long b) {
long y, x;

if (a > b) {
x = a;
y = b;
} else {
x = b;
y = a;
}

while (x % y != 0) {
long temp = x;
x = y;
y = temp % x;
}

return y;
}
```

So my no means a short piece of code. The code Alexandr sent me looks the following

```private static long gcd2(long x, long y) {
while (x * y != 0) {
if (x >= y) x = x % y;
else y = y % x;
}
return (x + y);
}
```

So if nothing else, it is definitively smaller in terms of lines of code and therefore I liked it. Also it avoids the temporary variable to interchange x and y. The email provided me with Alexandr’s test results which showed that on his computer the result of running multiple times gave us the following results

```00:00:04.5850623
00:00:03.4624319
```

with my GCD algorithm being the upper one which is 33% slower than the one he proposed. So that is indeed pretty nice. My own results show a similar result for running it the given number of times, but increasing the number of times lowers the gap, and at running it for 1-10.000 for both x and y reduced the difference to 14%. But still a good improvement. I know for sure that it has been changed in my library.

I know there are more efficient methods out there, so my question to you is how do you calculate the GCD? ### Posted by Kristian Kristian

I have already gotten the first response for this post which was a mail from Ram giving me a GCD algorithm using recursion. It looks like

```private static long gcd(long a, long b) {
long r;

r = a % b;
if (r == 0)
return b;
else if (r == 1)
return 1;
else
return gcd3(b, r);
}
```

According to my quick testing it is faster than two algorithms in the article. So nicely done with yet another implementation build upon Euclid’s algorithm. SuprDewd

Wow. That’s pretty cool. I always thought I had a pretty fast GCD function:

```long gcd3(long x, long y)
{
while (y != 0)
{
x %= y;
x ^= y;
y ^= x;
x ^= y;
}

return x;
}```

But after some tests, mine seems to be the slowest, and Alexandr’s the fastest.
I also tried improving his function and managed to make it a tiny bit faster:

```long gcd4(long x, long y) {
while (x != 0 && y != 0) { // changed this line
if (x >= y) x = x % y;
else y = y % x;
}
return (x + y);
}```

Another improvement is to use int instead of long when you can. SuprDewd

Faster:

```long gcd6(long x, long y)
{
while ((x & y) != 0)
{
if (x >= y) x %= y;
else y %= x;
}

return x + y;
}```

Even faster in C++:

```long long gcd6(long long x, long long y)
{
while ((x & y) != 0)
{
if (x >= y) x %= y;
else y %= x;
}

return x + y;
}```

Maybe there is an even faster way through prime factors of the numbers (I doubt it, but maybe for big numbers or some special cases).
I did some Google-ing and found this paper which might use some other method. SuprDewd

The C++ version above should have:

`while (x & y)`

`while ((x & y) != 0)` SuprDewd

Shortest GCD:

```int gcd8(int a, int b) {
return b == 0 ? a : gcd8(b, a % b);
}```

Also, according to my tests, Alexandr’s code is faster than Ram’s code. Kristian

Wow, this post has really surprised me. I got an email with a faster version of GCD than what I used and with a bit of tweaking by others who have seen the code, we are now clocking in a 40% lower running time according to my test. That is really impressive work by you guys. SuprDewd Kristian

According to Wikipedia, due to it’s increased complexity combined with the dedicated hardware for integer division, the gain should be rather small. But I guess the best way is to implement it and see, At least it sounds like a good candidate. Steve

The one I had been using is

```static long Gcd(long num1, long num2) {
if (num1 > num2) {
num1 %= num2;
if (num1 == 0) return num2;
}
do {
num2 %= num1;
if (num2 == 0) return num1;
num1 %= num2;
if (num1 == 0) return num2;
} while (true);
}
```

but Ram’s, with a few additional optimizations seems better

```static long Gcd4(long a, long b) {
long r;
if (b > a) {
r = b % a;
if (r == 0L) return a;
if (r == 1L) return 1L;
b = r;
}
while (true) {
r = a % b;
if (r == 0L) return b;
if (r == 1L) return 1L;
a = b;
b = r;
}
}
```

Have you found even faster ones since making this post? I tried the binary GCD linked here earlier, but it was a fair amount slower than these ones based on mod. Kristian

No I haven’t looked into speeding it up more. It never seems to the GCD part that is taking the time. Khaur

Hi!
gcd6 has a typo: x & y should be x && y, otherwise gcd(11,4) returns 15, which is obviously wrong (damn you, binary and!)

In my tests, compiling with -O3, Ram’s was the fastest by about 1.5%.

I added the test into Alexandr’s as well as changing the loop from a while to a do while (since x and y shouldn’t be null) and now it’s marginally faster (<1%) than Ram's. Here's how it looks:

```long long gcd(long long x, long long y)
{
do
{
if (x >= y){
x %= y;
}
else {
y %= x;
}
if(y==1 || x==1)
return 1;
}while (x && y);

return x + y;
}
```

Overall this is about a 10% improvement on what I used before. PP

private static int gcd(int u, int v)
{
for(; ; )
{
u %= v;
if (u == 0) return v;
v %= u;
if (v == 0) return u;
}
}

// I think the first mod operation with u<v is fast, didn'test it.
// Compare to zero is fast.
// Faster: Knuth's TAOCP 4.5.2 Lee Yiyuan

Hello! I’ve been visiting your blog very often (usually when I get stuck on projecteuler) and I must say, this website is very well done!

I’m not good at mathematical algorithms (GCD algorithms, in this case) but I do have some comments to make about the implementation of the GCD algorithms in your post as well as in the replies to your post.

Firstly, recursive methods crunch up a huge amount of memory space. This is especially so if you were to pass arguments to it recursively by value, because whenever it is called recursively, the objects are copied (because of it being passed by value). Of course, this doesn’t matter much for small numbers.

Secondly, if you were to be optimizing your algorithms, I would suggest that you write a native library instead, and create a managed wrapper around this native library (perhaps using C++ CLI). The last time I wrote a matrix library on both native C++ and managed C#, the native library performed about 25 times faster in terms of repeated 10×10 matrix multiplication, even after marshalling. Kristian

Hi Lee

Thank you. I am glad you find it enlightening. I encourage people to solve it on their own, and once people have solved it read the writeup I have done. I also know how annoying it is to get stuck on something though.

Regarding the recursiveness. Yes it does have a larger memory footprint, I am not sure how smart compilers are regarding that today.

Regarding the native implementation I fully agree with you, but for such small computations as gcd I doubt that you will gain much.

However, I would love to see the differences you point out implemented and tested.

[…] while ago I received a reply to a comment I made on Kristian’s blog (http://www.mathblog.dk/gcd-faceoff/#comment-106652). I was asked to illustrate the differences between the speed of native code and managed code. […] Lee Yiyuan

Hi Kristian,

I’ve written up a simple implementation of a managed wrapper around native methods that does GCD computations. It seems that the resultant performance was around twice as fast as managed methods.

You can check it out here! : http://codemyroad.wordpress.com/2013/06/20/native-code-vs-managed-code-performance-and-development/ Kristian

Thanks for the write-up it is an interesting read. And it does make sense. Both you comment about speed, but also the amount of code. So like almost everything else, it is a trade-off between speed or ease of development and maintenance. Mike

I was playing around with this at work today, and found your blog. My current best is:

```// returns the largest odd factor of x
long long odd_part(long long x) {
return x >> __builtin_ctzl(x);
}

// returns the GCD of two odd numbers
long long gcd_odd(long long x, long long y) {
while(1) {
long long t = x-y, u = x+y;
if (!t||!u) return llabs(x);
x = odd_part(t);
y = odd_part(u);
}
}

// returns the GCD of two numbers
long long gcd(long long x, long long y) {
if (x==0) return y;
else if (y==0) return x;
int sx = __builtin_ctzl(x), sy = __builtin_ctzl(y);
return gcd_odd(x>>sx,y>>sy) <y) {
x = oddpart(x-y);
} else if (x<y) {
y = oddpart(y-x);
} else {
return x;
}
}
}
```

The binary gcd_odd has the advantage that it works with unsigned numbers and cannot overflow. The plus/minus one is effectively limited to 62-bit numbers, but it's ever so slightly faster. Also, cooler. Bjarki Ágúst

Looks really neat, Mike. I also like the way you use __builtin_ctz. tb

This post is rated quite high for some keywords in google.
Therefore, I want to add a note about gcd2. gcd2 does not return the same results as gcd for all inputs because the multiplication will overflow quite easily without any hints for the caller.

Example (continued to use Java):
System.out.println(gcd(335007449088L, 83751862272L));
System.out.println(gcd2(335007449088L, 83751862272L));

Result:
83751862272 //=2^31*3*13, correct
418759311360 //=2^31*3*5*13, incorrect

Possible solutions:
-change the multiplication to 2 times compare to 0
-check for overflow integer arithmetic (Math.multiplyExact since Java8)
-rewrite to some other form like the one from PP:

```    static long gcd_xyz(long x, long y) {
if (y &gt; x)
return gcd5(y, x);
while (true) {
x %= y;
if (x == 0)
return y;
y %= x;
if (y == 0)
return x;
}
}
``` Richard

I speeded up my version of gcd by about 30% by changing from Euclid’s algorith to Stein’s:

```/*
see: http://lemire.me/blog/2013/12/26/fastest-way-to-compute-the-greatest-common-divisor/
calculate GCD of a and b. This version is a bit faster than the old one.
If using gcc compiler use __builtin_ctzll instead of _BitScanForward64
I use MS Visual Studio.

Note use of long long ints to avoid overflow.

u and v are UNSIGNED, caller must use llabs or similar if necessary.
Note: mathematically gcd(a,b) = gcd(-u,v) = gcd(u,-v) = gcd (-u,-v)
Note: GCD (greatest common denominator) is also known as HCF
(Highest Common Factor).
*/
unsigned long long int gcd(unsigned long long int u, unsigned long long int v)
{
bool result;
int shift, su, sv;
if (u == 0) return v;
if (v == 0) return u;
result = _BitScanForward64(&shift, u | v );
result = _BitScanForward64(&su, u);
u >>= su;
do {
result = _BitScanForward64(&sv, v);
v >>= sv;
if (u > v) {
unsigned int t = v;
v = u-v;
u = t;
}
else
v = v - u;
} while (v != 0);
return u << shift;
}

/* using Euclid's algorithm */
unsigned long long int gcd_old(unsigned long long int a, unsigned long long int b) {

if (a == 0) return b;
if (b == 0) return a;
for (;;) {
a %= b;
if (a == 0) return b;

b %= a;
if (b == 0) return a;
}
}
``` Richard

Concerning my previous comment:

I have found an error in the code at line 27.
It should be:

unsigned long long int t = v;