Project Euler 14: Longest Collatz Sequence

Written by on 10 December 2010

Topics: Project Euler

Project Euler is asking a question regarding the Collatz Conjecture in Problem 14

The problem reads

The following iterative sequence is defined for the set of positive integers:

nn/2 (n is even)
n
→ 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 40 20 10 5 16 8 4 2 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

The conjecture, as the name states has yet not been proved, and states that no matter what number you start with, using the above mentioned rules, you will eventually end up with 1. My favourite comic xkcd, rewrote the conjecture a bit but I think it holds as well in that version.

I propose a straight forward simple implementation, and a version that caches the results and thereby speeds up the execution. As far as I see it, there is no possibility to make short cuts in the search, we have to calculate the sequence for all numbers below 1.000.000

Direct approach

The direct approach is really straight forward – Make a loop over all numbers, run the sequence until it converges, and count the length. My code looks as

 const int number = 1000000;

long sequenceLength = 0;
long startingNumber = 0;
long sequence;

for (int i = 2; i <= number; i++) {
    int length = 1;
    sequence = i;
    while (sequence != 1) {
        if ((sequence % 2) == 0) {
            sequence = sequence / 2;
        } else {
            sequence = sequence * 3 + 1;
        }
        length++;
    }

    //Check if sequence is the best solution
    if (length > sequenceLength) {
        sequenceLength = length;
        startingNumber = i;
    }
}

and the result is

The starting number 837799 produces a sequence of 525.
Solution took 2352 ms

Nothing fancy it just takes a while.

Caching

There is nothing too mysterious over the caching method either. I just store the results I have already calculated.  I use an array as cache, since I know the necessary and sufficient cache size (needs to store 1.000.000 results) ahead of execution. At the cost of approximately a 1MB of RAM.

Since I start from the smallest number, I know that every time my sequence of numbers decreases below the starting number, I have already calculated the remaining starting length. So for even numbers that happens after the first iteration.

For the example given in the problem description

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
In the third iteration we hit the number 10, which is smaller than 13, and thus I have already calculated the sequence length for the number 10, so I can just add the length of the sequence I have already went through to get the total result.

The code I have used looks as

const int number = 1000000;

int sequenceLength = 0;
int startingNumber = 0;
long sequence;

int[] cache = new int[number + 1];
//Initialise cache
for (int i = 0; i < cache.Length; i++) {
    cache[i] = -1;
}
cache[1] = 1;

for (int i = 2; i <= number; i++) {
    sequence = i;
    int k = 0;
    while (sequence != 1 && sequence >= i) {
        k++;
        if ((sequence % 2) == 0) {
            sequence = sequence / 2;
        } else {
            sequence = sequence * 3 + 1;
        }
    }
    //Store result in cache
    cache[i] = k + cache[sequence];

    //Check if sequence is the best solution
    if (cache[i] > sequenceLength) {
        sequenceLength = cache[i];
        startingNumber = i;
    }
}

And the result is

The starting number 837799 produces a sequence of 525.
Solution took 113 ms

So it is a significant speed up compared to the direct approach

Wrapping up

I tried a version where I store all the intermediate results of each sequence as well, but the overhead of doing so, actually slowed the solution down, more than the speed up I got from the decreased number of iterations.

As usual the source code is available for download.  If you have improvements ideas for the algorithm I am very curious to hear about them.

Update: Based on the discussion I have made another caching strategy which also caches larger numbers. It runs a bit faster. You can find the source code for that in the download.

41 Comments For This Post I'd Love to Hear Yours!

  1. hesham says:

    Hello,

    First i would like to thank you on your great blog
    the post is great as usual but when you wrote in the cashing code

    while (sequence != 1 && sequence >= i) {

    who is this possible the two conditions are inconsistent with each other ,anyway thanks again and keep the great work

    Bye.

  2. hesham says:

    sorry for that i didn’t notice

  3. Kristian says:

    Thanks a lot for the comment, I love getting feedback.

    I guess by your second comment that you noticed that the first condition contains a 1 (number 1) while the second contains an i (letter i).

    I know they can be a bit difficult to spot.

    /Kristian

  4. donnyton says:

    Your solution seems to work, and you probably have a computer much more powerful than mine to get those times, but there is a far superior way to cache:

    Suppose we calculate the 13-sequence as given above. Later on we will see the following for 15:

    15 -> 46 -> 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40 -> …

    But back when we calculated 13, we could have already stored the value for 40! But your solution would not have stored 40 yet, since it goes in order. It seems clear that if we store EVERY value we calculate in any sequence, we can draw upon a far bigger pool of precalculated numbers.

    The caveat is that storing the numbers in order requires a fixed amount of space, but storing arbitrary numbers in the sequence would require an unreasonably sized array. Why? Even as early as 77671, we find a number like 1047216490 in the middle of the sequence. Therefore as a final step, we must swap out the array for a hash table instead. The final solution calculates the formula until it finds a value that’s already in the hash table. It then backtracks, recursively adding every previous number in the sequence into the table as well.

    This affords us huge savings in speed, as we store not only numbers smaller than us, but numbers far bigger, and by the time you reach the hundreds of thousands, many of the values will have already been calculated in their entirety by the lower sequences and return immediately.

  5. Kristian says:

    Hi Donnyton

    Thanks for your elaborate comment, it means a lot to me to get feedback, I learn a lot from other people.

    You are right, I have a fairly fast computer, but for some problems it is not about spending 100ms or 10s, it is about whether it is solvable or not withing a reasonable waiting time. For other problems such as this one, it is more about comparing solutions to one another.

    I have been thinking along the same lines as you regarding the caching of larger numbers such as 40 in the example you mention. But when I tried to solve the problem the overhead of administering a sequence and updating the cache according to it was far greater than what I gained from it. When I pursued that strategy I used a list to keep track of the sequence I was developing, and the overhead of that was far too great. If nothing else I guess I will have to revisit the problem based on your input.

    I never thought of using hashtables or it though. It might be a much better solution than the one I pursued. If you have a solution which is faster than the one I have presented, I would be very eager to see it.

    /Kristian

  6. SuprDewd says:

    This is my implementation of donnyton’s suggestion.
    I’m not sure if this is exactly what he had in mind, but it works.
    I use a Dictionary instead of a HashTable, because I can’t see how I’m supposed to store the associated chain length along with it.
    But if it’s not the way you imagined it, donnyton, please post your code and let us take a look.
    Here it goes.

    public static void DonnytonProblem14()
    {
    	Stopwatch clock = Stopwatch.StartNew();
    
    	Dictionary<long, int> cache = new Dictionary<long, int>();
    	cache.Add(1, 1);
    	cache.Add(2, 2);
    	cache.Add(4, 3);
    
    	long maxNumber = 2;
    	int maxChainLength = 1;
    
    	for (long start = 3; start < 1000000; start++)
    	{
    		List<long> chain = new List<long>();
    		int chainLength = 1;
    		long n = start;
    
    		while (n != 1 && !cache.TryGetValue(n, out chainLength))
    		{
    			chain.Add(n);
    
    			if ((n & 1) == 0) n >>= 1;
    			else n = (n << 1) + n + 1;
    		}
    
    		for (int i = chain.Count - 1; i >= 0; i--)
    		{
    			cache.Add(chain[i], ++chainLength);
    		}
    
    		if (chain.Count > 0 && chainLength > maxChainLength)
    		{
    			maxNumber = chain[0];
    			maxChainLength = chainLength;
    		}
    	}
    
    	clock.Stop();
    
    	Console.WriteLine("The starting number {0} produces a sequence of {1}.", maxNumber, maxChainLength);
    	Console.WriteLine("Solution took {0} ms", clock.ElapsedMilliseconds);
    }
    

    It takes 1250 ms which is the same as Kristian’s non-cached version (on my computer).
    This is my original solution (also about 1250 ms on my computer).

    public static void MyProblem14()
    {
    	Func<Func<long, long>, long, long> collatzCount = (f, i) => i % 2 == 0 ? 1 + f(i / 2) : 1 + f(3 * i + 1);
    	var collatzCountMemoized = collatzCount.Memoize(new Dictionary<long, long> { { 1, 1 } });
    
    	Console.WriteLine(2L.To(999999).Select(N => new { N, Count = collatzCountMemoized(N) }).MaxItem(i => i.Count).N);
    }
    

    The Memoize and MaxItem methods are extension methods I created (not part of .Net).

  7. Kristian says:

    Hi SuprDewd

    I am not close to my own computer, so I can’t dig into the code you posted yet. I will do so when I get home. I think you can use a hashtable for the same thing, the add command takes an key and a value, where the key is the number and the value is the sequence length of the sequence from the key.

    You can then test if the value has already been cahced with containsKey.

    I think the overhead is in the list you are keeping for the current sequence, but that would require a more fine grained profiler than I currently have access to.

    And not least; nice linq implementation you posted.
    /Kristian

  8. donnyton says:

    No need to store the current sequence you are calculating: just use a recursive solution! This solution recursively calls getLength() on each number without storing it until it reaches a cached value. It then unwinds the recursion and caches everything that was in between.

    (Also, I’m not sure how to post code, but I’ll give it a try)

    public class Problem014
    {
    	public static HashMap lengths;
    	
    	public static void main(String[] args)
    	{
    		DebugUtil.defaultOutput.toggle(DebugState.OFF);
    		
    		long end = 1000000L;
    		long longestIndex = 1;
    
    		//avoid calculating numbers we passed
    		lengths = new HashMap();
    		lengths.put(1L, 1L);
    		
    		for(int i = 2; i = lengths.get(longestIndex))
    			{
    				longestIndex = i;
    			}
    		}
    		
    		System.out.println(longestIndex + " with length " + lengths.get(longestIndex));
    	}
    	
    	public static long getLength(long i)
    	{
    		//previously calculated, return immediately
    		if(lengths.containsKey(i))
    		{
    			DebugUtil.defaultOutput.print(null, i + "(" + lengths.get(i) + ")");
    			return lengths.get(i);
    		}
    		else
    		{
    			if(i % 2 == 0)
    			{
    				DebugUtil.defaultOutput.print(null, i + " -> ");
    				lengths.put(i, 1 + getLength(i / 2));
    				return lengths.get(i);
    			}
    			else
    			{
    				DebugUtil.defaultOutput.print(null, i + " -> ");
    				lengths.put(i, 1 + getLength(3 * i + 1));
    				return lengths.get(i);
    			}
    		}
    	}
    }
    

    P.S. SuprDewd, HashTable and Dictionary are essentially the same, with only minor implementation details different.

  9. SuprDewd says:

    Ah, I see. Our implementations are pretty much the same except for differences in bottom-up, top-down ideology.
    And I knew about the HashTable vs. Dictionary. For some reason I thought you were talking about a HashSet :). Dictionary is just a generic version of HashTable (in C#, I don’t know about Java), which should be faster (no boxing/unboxing needed).

  10. SuprDewd says:

    This is pretty much the same as donnyton’s method.
    Still is only 950 ms.

    public static Dictionary<long, int> CachedChainLengths = new Dictionary<long, int>();
    
    public static int ChainLength(long n)
    {
    	if (n == 1) return 1;
    	int chainLength;
    	if (CachedChainLengths.TryGetValue(n, out chainLength)) return chainLength;
    
    	long next = (n & 1) == 0 ? (n >> 1) : (n << 1) + n + 1;
    	chainLength = ChainLength(next) + 1;
    	CachedChainLengths.Add(n, chainLength);
    	return chainLength;
    }
    
    public static void DonnytonProblem14Better()
    {
    	Stopwatch clock = Stopwatch.StartNew();
    
    	long maxNumber = 1;
    	int maxChainLength = 1;
    
    	
    	for (int n = 2; n < 1000000; n++)
    	{
    		int chainLength = ChainLength(n);
    		if (chainLength > maxChainLength)
    		{
    			maxNumber = n;
    			maxChainLength = chainLength;
    		}
    	}
    
    	clock.Stop();
    
    	Console.WriteLine("The starting number {0} produces a sequence of {1}.", maxNumber, maxChainLength);
    	Console.WriteLine("Solution took {0} ms", clock.ElapsedMilliseconds);
    }
    

    But I can’t figure out what you are doing here:

    for(int i = 2; i = lengths.get(longestIndex))
    {
    longestIndex = i;
    }
    

    Is that some clever way of fixing in on the result? This would not compile in C#.

  11. donnyton says:

    Well, it still doesn’t display properly. How do I get that box made for posting code?

  12. SuprDewd says:

    [ java ]
    System.out.println(longestIndex + ” with length ” + lengths.get(longestIndex));
    [ /java]

    Without spaces.

  13. SuprDewd says:

    Kristian posted some formatting tips here

  14. Kristian says:

    Hi Donnyton

    I took the liberty to add some code tags to your post Donnyton. Thanks for posting your solution.

    I see the idea with your solution: it is really a way I hadn’t thought. I like it though. I only have the same question as SuprDewd how to interpret

     for(int i = 2; i = lengths.get(longestIndex))
                {
                    longestIndex = i;
                }
            }
    

    It is not a construction I have seen before.

    /Kristian

  15. donnyton says:

    Still doesn’t appear to work. As a last resort I’ve put the code here:

    http://dl.dropbox.com/u/6566673/Problem014.java

    Please remove my other comments, or figure out a way to format it properly if you can.

  16. Kristian says:

    Hi Donnyton

    You need to remove the white space in the tags. Sorry for the confusion it has caused. have also cleaned up the discussion slightly to make it easier to follow.

    I have taken the liberty to post the code you linked to, and reading it now makes perfect sense and seem like a good strategy. I will try to compare that method to my previous ones.

    public class Problem014
    {
    	public static HashMap<Long, Long> lengths;
    	
    	public static void main(String[] args)
    	{
    		DebugUtil.defaultOutput.toggle(DebugState.OFF);
    		
    		SpeedTimer.start();
    		
    		long end = 1000000L;
    		long longestIndex = 1;
    
    		//avoid calculating numbers we passed
    		lengths = new HashMap<Long, Long>();
    		lengths.put(1L, 1L);
    		
    		for(int i = 2; i < end; i++)
    		{
    			long length = getLength(i);
    			//DebugUtil.defaultOutput.println(" [" + length + "] ");
    			
    			if(length >= lengths.get(longestIndex))
    			{
    				longestIndex = i;
    			}
    		}
    		
    		SpeedTimer.stopSeconds(true);
    		
    		System.out.println(longestIndex + " with length " + lengths.get(longestIndex));
    	}
    	
    	public static long getLength(long i)
    	{
    		//previously calculated, return immediately
    		if(lengths.containsKey(i))
    		{
    			//DebugUtil.defaultOutput.print(null, i + "(" + lengths.get(i) + ")");
    			return lengths.get(i);
    		}
    		else
    		{
    			if(i % 2 == 0)
    			{
    				//DebugUtil.defaultOutput.print(null, i + " -> ");
    				lengths.put(i, 1 + getLength(i / 2));
    				return lengths.get(i);
    			}
    			else
    			{
    				//DebugUtil.defaultOutput.print(null, i + " -> ");
    				lengths.put(i, 1 + getLength(3 * i + 1));
    				return lengths.get(i);
    			}
    		}
    	}
    }
    
  17. Kristian says:

    I have been revisiting this problem with the insights you gave me donnyton, and I tried to implement the same thing as you did using a dictionary. However that executes in 584ms, about 5 times slower than the first cached version I posted. Working with a dictionary is so much more expensive than evaluating a long collatz sequence, after all it is simple arithmetic.

    I liked the idea of caching even larger number, which you made possible with the recursive approach. So I tried to combine the approaches using an array as cache to gain the good performance of an array with the more advanced caching. It resulted in the following code.

    private int[] recArrayCache;
    
    public void SolveWithRecursionArray() {
        Stopwatch clock = Stopwatch.StartNew();
    
        const int number = 1000000;
        int longestSeqIndex = 0;
        int longestSeq = 0;
        int sequenceLength = 0;
    
        recArrayCache = new int[number+1];
        recArrayCache[1] = 1;
    
        for (int i = 2; i <= number; i++) {
            sequenceLength = getArrayLength(i);
    
            //Check if sequence is the best solution
            if (sequenceLength > longestSeq) {
                longestSeqIndex = i;
                longestSeq = sequenceLength;
            }
        }
    
        clock.Stop();
        Console.WriteLine("The starting number {0} produces a sequence of {1}.", longestSeqIndex, longestSeq);
        Console.WriteLine("Solution took {0} ms", clock.ElapsedMilliseconds);
    }
    
    
    private int getArrayLength(long i) {
        int chainLength = 0;
        if (i < recArrayCache.Length && recArrayCache[i] != 0)
            return recArrayCache[i];
    
        if (i % 2 == 0) {
            chainLength = 1 + getArrayLength(i / 2);
        } else {
            chainLength = 1 + getArrayLength(3 * i + 1);
        }
    
        if (i < recArrayCache.Length)
            recArrayCache[i] = chainLength;
        return chainLength;
    }
    

    Which executes in 87ms, so faster than the first method I posted. The source code is included in the source code download

  18. donnyton says:

    But the above solution doesn’t cache all the large numbers. It caches the ones up to a million or so. You can’t really use an array to cache the whole thing because it is near impossible to predict how big the numbers in the sequence will become.

    In any case, this result causes more interesting questions to arise. When do we encounter really big numbers in the middle of Collatz sequences, and what causes them to appear? How many more numbers do we have to store to cache my complete way versus the faster, more selective array method?

  19. Kristian says:

    Hi Donnyton

    Yes, it would be rather interesting to see if what the largest number we encounter when testing sequences below 1.000.000 and how many numbers above 1.000.000 do we encounter. For a start we an easily conclude that any odd number above 333.333 implies that we will reach a number above 1.000.000, so we will make cache misses at least 333.333 times.

    /Kristian

  20. donnyton says:

    After simple tests with 1,000,000:

    Largest number encountered: 56991483520
    Successful caches (your way): 999998
    Missed caches (your way): 1355037
    Total cache (my way): 2168611
    Total calls to getArrayLength (your way): 3355033
    Total calls to getLength (my way): 3168608

    With 10,000,000:

    Largest number encountered: 60342610919632
    Successful caches (your way): 9999998
    Missed caches (your way): 13625963
    Total calls to getArrayLength (your way): 33625959

    With 100,000,000:

    Largest number encountered: 2185143829170100
    Successful caches (your way): 99999998
    Missed caches (your way): 136289344
    Total calls to getArrayLength (your way): 336289340

    The most fascinating result I got out of that was how linear some of those numbers are. In every case, about 70% of getArrayLength() calls tried to cache (the remaining 30% returned from cache immediately), with very low error (+-.1%). Of the calls that tried to cache, around 57% were out of bounds. The number of calls to getArrayLength() was always around 3.36 x the end number.

    The largest number encountered, of course, grows far faster than either of our storage could handle. My solution degrades rapidly in speed for the larger numbers; I didn’t bother waiting for it to finish.

  21. Kristian says:

    Hi Donnyton

    Thanks for posting these statistics, they give some pretty interesting results.

    I think it is pretty interesting that even though there is under 50% of the numbers cached it only costs 6% more calls to the getlength function.

    /Kristian

  22. donnyton says:

    By the way, without spoiling anything if you haven’t solved it, these issues appear again in problem 67. There are once again two ways to do it, each parallel to the way we each did it.

  23. Kristian says:

    I have solved problem 67, but using dynamical programming which is the same approach as Problem 18.

  24. Akshay says:

    guess this might work faster though i didn’t calculate the time…

    #include
    int main()
    {
        long long int j=1,max=0,k=0;
        while(j1)
        {
            if(i&1)
            {
                i=(3*i)+1;
                cnt++;
            }
            while((i&1)==0){
                i=i>>1;
                cnt++;
            }
    
        }
            if(max<cnt){
                max=cnt;
                k=j;
            }
            j++;
        }
        printf("%lld",k);
    }
    
  25. Muhammad says:

    Hi Kristian
    I’ve tried your solution using c++ both caching and direct approach but it’s takes too long time and it doesn’t halt…I have a relatively fast pc..I saw and psuedocode on wikipedia :

    while n > 1

    show n
    if n is odd then

    set n = 3n + 1

    else

    set n = n / 2

    endif

    endwhile
    show n

    and I noticed the line :while(sequence != 1)
    so I changed it to while(sequence > 1)
    it halted fast but gave 910107 as an answer which is wrong can you help me?

  26. Kristian says:

    Hi

    It is really difficult to help you given the information you provide. Assuming your are doing everything in integers, the change you made shouldn’t make a difference.

    If you are doing things in doubles or the like, then that is most likely your problem.

  27. Nikhil says:

    Muhammed,

    You may be facing overflow error. Use long long for the current number you are working with instead of long or int in C++.

  28. Eoin says:

    Hi,

    Nice blog, this is the first post I found but I look forward to checking out more.

    I wrote a blog post on the same Project Euler problem (that’s how I found yours). You can find it here. Here’s my function that does the work:

    uint getCollatzSeriesLength(uint value)
    {
      uint seriesLength;
    
      if ((value <= noTerms) && (lengthsCache[value] != 0)) {
        seriesLength = lengthsCache[value];
      }
      else if (value == 1) {
        seriesLength = 1;
      }
      else {
        if (value%2 == 0) {
          seriesLength = getCollatzSeriesLength(value/2) + 1;
        }
        else {
          seriesLength = getCollatzSeriesLength(value*3+1) +1;
        }
        auto additionalSteps = 0;
        for (auto i=value; i<=noTerms; i*=2) {
          lengthsCache[value] = seriesLength + additionalSteps++;
        }
      }
    
      return seriesLength;
    }
    

    It’s pretty similar to some of the suggestions in the comments using recursion, but I’ve added one more optimisation. When I calculate the series length for n I know that the series length of 2n is one greater. Therefore when I calculate a series length for any number I double it and cache its series length as one greater than the original number. I repeat this for the full range. This optimisation gave me a ~2.3x speedup.

  29. Jean-Marie Hachey says:

    Table 1
    Variation of the starting numbers generating the longest
    sequence lengths for various range limits (1-100 up to 1-20M).
    (Re: Project Euler 14)

    http://img4.hostingpics.net/pics/452647pe14table1.jpg

    Table 1 shows that only one starting number is prime (97).
    In this single case, the starting number is smaller than the
    sequence length.

    ___

    Fig. 1
    Starting numbers generating the longest sequence lengths
    for various range limits (1-100 up to 1-20 000 000).
    (Re: Project Euler 14)

    http://img4.hostingpics.net/pics/405934pe14fig1.jpg

    ___

    Sources:

    1) Project Euler 14
    Kristian’s algorithm
    http://www.mathblog.dk/files/euler/Problem14.cs
    2) Microsoft Visual C# 2010 Express
    3) Microsoft Office Excel (2007)
    4) MathBlog
    Prime Factorization Calculator
    http://www.mathblog.dk/tools/prime-factorization/

  30. Sameer Jain says:

    Hi Kristian,
    This is my implementation of the same non-caching logic as you gave above but it takes more than 2 minutes to solve:

    length = 1
    max_length = 1
    number =1
    
    for i in range(2,1000001):
        length = 1
        x = i
        while(x!=1):
            if(x%2==0):
                x /= 2
            else:
                x = x*3+1
            length += 1
    
        if(length&gt;max_length):
            max_length = length
            number = i
    
    print (number)
    

    I have a fairly fast laptop i7 processor. What might be the reason for the difference in time taken to solve ? Is it because I’m using Python ?

  31. Kristian says:

    Without being a Python expert, it seems you have made the same implementation. I am pretty surprised at your runtime though. Maybe someone can see something I couldn’t.

  32. Sarfaraz says:

    I tried to do what donnyton said.
    This implementation stores the collatz lengths for all the numbers encountered independent of their size. I guess it can be implemented in cpp using map of stl.

    dic={}
    dic[1]=1
    def func(x,c):
            if x in dic:
                    return c+dic[x]
            else:
                    if x%2==0:
                            d=func(x/2,c+1)
                    else:
                            d=func(3*x+1,c+1)
            dic[x]=d-c
            return d
    m,n=0,0
    for i in xrange(2,1000001):
            m=max(m,func(i,0))
            if m!=n:
                    ans=i
            n=m
    print ans
    
  33. Mohammad says:

    When I first wrote the program for this program, I put while(n>1) and gave different answer
    What’s the difference if I put while(n>1) and while(n!=1)?

  34. Jon says:

    For Sameer, Python runs very slowly with that method. You can decrease the time by half by changing the for loop to –> for i in range(1,999999,2)
    as this only hits the odd numbers. I was able to run this in around 60 sec.

    Same thing can be done with Sarfaraz’s method. This took about 1 sec of runtime off for me. (~2.7s)

    In comparison, I was able to use a direct approach with C++ in 0.5 sec.

  35. Kristian says:

    Unless you have a mistake such that the number can be negative or zero then that shouldn’t make a difference.

  36. Svetoslav says:

    Hey Kristian,

    I like your solution, particularly the using of the cache. However, if you want to take even less time use right shift instead of division by two. What I mean is that a / 2 = a >> 1.

    On my machine the average time for 10 000 runs of your solution was ~145ms per run and using right shift the average was ~105ms per run.

    The last thing I want to add, because I like one-liners is that the while loop can look like this:

    while (sequence != 1 && sequence >= i)
    {
    k++;
    sequence = sequence % 2 == 0 ? sequence >> 1 : sequence * 3 + 1;
    }

    Hope this helps 😉

  37. Darryl says:

    My version of this has a few additional optimizations.

    * Stop calculating the chain length when you first find a cached value, not when you hit a number lower than the start of the chain. In your case, these are equivalent. In mine, they are not, as you will see.

    * Once you find the chain length, immediately run through the chain a second time, caching chain lengths as you go. You say that your code slowed down when you did this. Mine sped up. My guess is that you were somehow tracking the numbers you had visited in the chain, such as putting them in a list. Simply recalculating the chain is faster.

    * Similarly, right after finding the chain length, cache the chain lengths for values that are powers of two larger than the chain root. In other words:

                chain_length = cache[i];
                for (unsigned int j = i * 2; j &lt;= MAX_ROOT_VAL; j *= 2) {
                    cache[j] = ++chain_length;
                }
    

    * And finally, thanks to caching the doubled values, we only have to loop through odd numbers (every even number is two times a smaller value we’ve already visited, so its length has already been determined).

    I verified that each of these optimizations actually did speed it up. It’s now 10% to 20% faster than the top-of-chain-only caching you show in your blog post. I agree with an earlier comment that there’s no point in speeding up code that is already fast enough, but I enjoy it.

  38. mr.unease says:

    I don’t know if it’s relevant but in the caching exemples main while loop you have stated that if sequence != 1 the loop should end, then you state that if sequence <= i the loop should end.. so the first statement will occur one time.

    The ONE time it does occur is for 2, which is handled by the second statement as well. I might be mistaken and C++ is not native to me.

    On another note, it might be faster to do some sorting of the list and extract the highest value directly instead of doing 10^6 if comparisons..

    Python handles this sort of problem with with dynamical programming, poorly. It seems that it is because of the list data-type, its a damn sloth. But classes and definitions does not take increase runtime any longer.

    import time

    class LongestCollatzsequence:
    def __init__(self, limit):
    self.time = time.time()
    self.limit = limit + 1
    self.path = [-1] * self.limit
    self.path[1] = 1
    self.ans = 0
    self.run()

    def run(self):
    self.paths()
    self.ans = max(self.path)
    self.end = time.time()

    def paths(self):
    for i in range(2, self.limit):
    self.seq = i
    k = 0
    while self.seq >= i:
    k += 1
    if self.seq % 2 == 0: self.ifEven(self.seq)
    else: self.ifOdd(self.seq)
    self.path[i] = k + self.path[self.seq]

    def ifEven(self, x):
    self.seq //= 2

    def ifOdd(self, x):
    self.seq = (self.seq * 3) + 1

    Ans = LongestCollatzsequence(10 ** 6)
    print ('%s\nTime: %.3f' % (Ans.ans, (Ans.end – Ans.time) * 1000))

  39. Manoj says:

    Is there any way to solve the problem in the reverse way that is starting from 1 and multiplying by 2 and so on such that we can get max no. operations as possigble and solve the problem

  40. Mark says:

    Hi! I want to ask whether a processor affects the code runtime, my computer took 3.72s to solve caching method.

  41. Paul says:

    I just solved this problem tonight and also decided to try a caching solution.

    My code is damn near identical to your version! For some reason it runs about half as slow (250ms), but as far as I can tell we used the exact same algorithm.

    I also upped the cache to about 200 million, and found that 169,941,673 has the long sequence of 954 (found in about 40 seconds). Performance seems to drop off considerably after it gets into the hundred-millions.

    Thanks for writing about this, it’s an interesting problem.

Leave a Comment Here's Your Chance to Be Heard!

You can use these html tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

You can use short tags like: [code language="language"][/code]

The code tag supports the following languages: as3, bash, coldfusion, c, cpp, csharp, css, delphi, diff,erlang, groovy, javascript, java, javafx, perl, php, plain, powershell, python, ruby, scala, sql, tex, vb, xml

You can use [latex]your formula[/latex] if you want to format something using latex

This site uses cookies.