GCD faceoff

GCD faceoff

Written by on 11 April 2012

Topics: Programming

Recently I received an email from a guy named Alexandr who reads the blog. He send me a small piece of C# code with the statement that my Greatest Common Denominator (GCD) algorithm was not as efficient as it could be. As a side note this algorithm is  also known as greatest common factor GCF and highest common factor HCF. To my defense I would like to state that I do believe that even my implementation of the algorithm is sufficiently good to solve most problems and it is a really good enough.

Being the geek that we all are it peeked my interest and I wanted to look a bit into it and test which one was faster as well as open the question for you to give your favorite GCD algorithm.

My algorithm relies on Euclid’s algorithm which is described on wikipedia and the implementation looks as

private static long gcd(long a, long b) {
    long y, x;

    if (a > b) {
        x = a;
        y = b;
    } else {
        x = b;
        y = a;
    }

    while (x % y != 0) {
        long temp = x;
        x = y;
        y = temp % x;
    }

    return y;
}

So my no means a short piece of code. The code Alexandr sent me looks the following

private static long gcd2(long x, long y) {
    while (x * y != 0) {
        if (x >= y) x = x % y;
        else y = y % x;
    }
    return (x + y);
}

So if nothing else, it is definitively smaller in terms of lines of code and therefore I liked it. Also it avoids the temporary variable to interchange x and y. The email provided me with Alexandr’s test results which showed that on his computer the result of running multiple times gave us the following results

00:00:04.5850623
00:00:03.4624319

with my GCD algorithm being the upper one which is 33% slower than the one he proposed. So that is indeed pretty nice. My own results show a similar result for running it the given number of times, but increasing the number of times lowers the gap, and at running it for 1-10.000 for both x and y reduced the difference to 14%. But still a good improvement. I know for sure that it has been changed in my library.

I know there are more efficient methods out there, so my question to you is how do you calculate the GCD?

19 Comments For This Post I'd Love to Hear Yours!

  1. Kristian says:

    I have already gotten the first response for this post which was a mail from Ram giving me a GCD algorithm using recursion. It looks like

    private static long gcd(long a, long b) {
        long r;
    
        r = a % b;
        if (r == 0)
            return b;
        else if (r == 1)
            return 1;
        else
            return gcd3(b, r);
    }
    

    According to my quick testing it is faster than two algorithms in the article. So nicely done with yet another implementation build upon Euclid’s algorithm.

  2. SuprDewd says:

    Wow. That’s pretty cool. I always thought I had a pretty fast GCD function:

    long gcd3(long x, long y)
    {
    	while (y != 0)
    	{
    		x %= y;
    		x ^= y;
    		y ^= x;
    		x ^= y;
    	}
    
    	return x;
    }

    But after some tests, mine seems to be the slowest, and Alexandr’s the fastest.
    I also tried improving his function and managed to make it a tiny bit faster:

    long gcd4(long x, long y) {
        while (x != 0 && y != 0) { // changed this line
            if (x >= y) x = x % y;
            else y = y % x;
        }
        return (x + y);
    }

    Another improvement is to use int instead of long when you can.

  3. SuprDewd says:

    Faster:

    long gcd6(long x, long y)
    {
        while ((x & y) != 0)
        {
            if (x >= y) x %= y;
            else y %= x;
        }
    
        return x + y;
    }

    Even faster in C++:

    long long gcd6(long long x, long long y)
    {
        while ((x & y) != 0)
        {
            if (x >= y) x %= y;
            else y %= x;
        }
    
        return x + y;
    }

    Maybe there is an even faster way through prime factors of the numbers (I doubt it, but maybe for big numbers or some special cases).
    I did some Google-ing and found this paper which might use some other method.

  4. SuprDewd says:

    The C++ version above should have:

    while (x & y)

    instead of

    while ((x & y) != 0)
  5. SuprDewd says:

    Shortest GCD:

    int gcd8(int a, int b) {
    	return b == 0 ? a : gcd8(b, a % b);
    }

    Also, according to my tests, Alexandr’s code is faster than Ram’s code.

  6. Kristian says:

    Wow, this post has really surprised me. I got an email with a faster version of GCD than what I used and with a bit of tweaking by others who have seen the code, we are now clocking in a 40% lower running time according to my test. That is really impressive work by you guys.

  7. Kristian says:

    According to Wikipedia, due to it’s increased complexity combined with the dedicated hardware for integer division, the gain should be rather small. But I guess the best way is to implement it and see, At least it sounds like a good candidate.

  8. Steve says:

    The one I had been using is

    static long Gcd(long num1, long num2) {
        if (num1 > num2) {
            num1 %= num2;
            if (num1 == 0) return num2;
        }
        do {
            num2 %= num1;
            if (num2 == 0) return num1;
            num1 %= num2;
            if (num1 == 0) return num2;
        } while (true);
    }
    

    but Ram’s, with a few additional optimizations seems better

    static long Gcd4(long a, long b) {
        long r;
        if (b > a) {
            r = b % a;
            if (r == 0L) return a;
            if (r == 1L) return 1L;
            b = r;
        }
        while (true) {
            r = a % b;
            if (r == 0L) return b;
            if (r == 1L) return 1L;
            a = b;
            b = r;
        }
    }
    

    Have you found even faster ones since making this post? I tried the binary GCD linked here earlier, but it was a fair amount slower than these ones based on mod.

  9. Kristian says:

    No I haven’t looked into speeding it up more. It never seems to the GCD part that is taking the time.

  10. Khaur says:

    Hi!
    gcd6 has a typo: x & y should be x && y, otherwise gcd(11,4) returns 15, which is obviously wrong (damn you, binary and!)

    In my tests, compiling with -O3, Ram’s was the fastest by about 1.5%.

    I added the test into Alexandr’s as well as changing the loop from a while to a do while (since x and y shouldn’t be null) and now it’s marginally faster (<1%) than Ram's. Here's how it looks:

    long long gcd(long long x, long long y)
    {
        do
        {
        if (x >= y){
          x %= y;
        }
        else {
          y %= x;
        }
        if(y==1 || x==1)
          return 1;
        }while (x && y);
    
        return x + y;
    }
    

    Overall this is about a 10% improvement on what I used before.

  11. PP says:

    private static int gcd(int u, int v)
    {
    for(; ; )
    {
    u %= v;
    if (u == 0) return v;
    v %= u;
    if (v == 0) return u;
    }
    }

    // I think the first mod operation with u<v is fast, didn'test it.
    // Compare to zero is fast.
    // Faster: Knuth's TAOCP 4.5.2

  12. Lee Yiyuan says:

    Hello! I’ve been visiting your blog very often (usually when I get stuck on projecteuler) and I must say, this website is very well done!

    I’m not good at mathematical algorithms (GCD algorithms, in this case) but I do have some comments to make about the implementation of the GCD algorithms in your post as well as in the replies to your post.

    Firstly, recursive methods crunch up a huge amount of memory space. This is especially so if you were to pass arguments to it recursively by value, because whenever it is called recursively, the objects are copied (because of it being passed by value). Of course, this doesn’t matter much for small numbers.

    Secondly, if you were to be optimizing your algorithms, I would suggest that you write a native library instead, and create a managed wrapper around this native library (perhaps using C++ CLI). The last time I wrote a matrix library on both native C++ and managed C#, the native library performed about 25 times faster in terms of repeated 10×10 matrix multiplication, even after marshalling.

  13. Kristian says:

    Hi Lee

    Thank you. I am glad you find it enlightening. I encourage people to solve it on their own, and once people have solved it read the writeup I have done. I also know how annoying it is to get stuck on something though.

    Regarding the recursiveness. Yes it does have a larger memory footprint, I am not sure how smart compilers are regarding that today.

    Regarding the native implementation I fully agree with you, but for such small computations as gcd I doubt that you will gain much.

    However, I would love to see the differences you point out implemented and tested.

  14. Lee Yiyuan says:

    Hi Kristian,

    I’ve written up a simple implementation of a managed wrapper around native methods that does GCD computations. It seems that the resultant performance was around twice as fast as managed methods.

    You can check it out here! : http://codemyroad.wordpress.com/2013/06/20/native-code-vs-managed-code-performance-and-development/

  15. Kristian says:

    Thanks for the write-up it is an interesting read. And it does make sense. Both you comment about speed, but also the amount of code. So like almost everything else, it is a trade-off between speed or ease of development and maintenance.

  16. Mike says:

    I was playing around with this at work today, and found your blog. My current best is:

    // returns the largest odd factor of x
    long long odd_part(long long x) {
      return x >> __builtin_ctzl(x);
    }
    
    // returns the GCD of two odd numbers
    long long gcd_odd(long long x, long long y) {
      while(1) {
        long long t = x-y, u = x+y;
        if (!t||!u) return llabs(x);
        x = odd_part(t);
        y = odd_part(u);
      }
    }
    
    // returns the GCD of two numbers
    long long gcd(long long x, long long y) {
      if (x==0) return y;
      else if (y==0) return x;
      int sx = __builtin_ctzl(x), sy = __builtin_ctzl(y);
      return gcd_odd(x>>sx,y>>sy) <y) {
          x = oddpart(x-y);
        } else if (x<y) {
          y = oddpart(y-x);
        } else {
          return x;
        }
      }
    }
    

    The binary gcd_odd has the advantage that it works with unsigned numbers and cannot overflow. The plus/minus one is effectively limited to 62-bit numbers, but it's ever so slightly faster. Also, cooler.

  17. Bjarki Ágúst says:

    Looks really neat, Mike. I also like the way you use __builtin_ctz.

  18. tb says:

    This post is rated quite high for some keywords in google.
    Therefore, I want to add a note about gcd2. gcd2 does not return the same results as gcd for all inputs because the multiplication will overflow quite easily without any hints for the caller.

    Example (continued to use Java):
    System.out.println(gcd(335007449088L, 83751862272L));
    System.out.println(gcd2(335007449088L, 83751862272L));

    Result:
    83751862272 //=2^31*3*13, correct
    418759311360 //=2^31*3*5*13, incorrect

    Possible solutions:
    -change the multiplication to 2 times compare to 0
    -check for overflow integer arithmetic (Math.multiplyExact since Java8)
    -rewrite to some other form like the one from PP:

        static long gcd_xyz(long x, long y) {
            if (y &gt; x)
                return gcd5(y, x);
            while (true) {
                x %= y;
                if (x == 0)
                    return y;
                y %= x;
                if (y == 0)
                    return x;
            }
        }
    

Trackbacks For This Post

  1. Native Code VS Managed Code – Performance and Development | Code My Road For My Own Tomorrow

Leave a Comment Here's Your Chance to Be Heard!

You can use these html tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

You can use short tags like: [code language="language"][/code]

The code tag supports the following languages: as3, bash, coldfusion, c, cpp, csharp, css, delphi, diff,erlang, groovy, javascript, java, javafx, perl, php, plain, powershell, python, ruby, scala, sql, tex, vb, xml

You can use [latex]your formula[/latex] if you want to format something using latex

Notify me of comments via e-mail. You can also subscribe without commenting.