I have found Problem 53 of Project Euler to be a really interesting problem to work with, because there is a brute force solution and then there is a much more elegant solution if you dive into the mathematics behind the question. The problem reads

There are exactly ten ways of selecting three from five, 12345:

123, 124, 125, 134, 135, 145, 234, 235, 245, and 345

In combinatorics, we use the notation,^{5}C_{3}= 10.

In general,

where,and

It is not untiln= 23, that a value exceeds one-million:^{23}C_{10}= 1144066.

How many, not necessarily distinct, values of≤^{n}C_{r}, for 1≤n100, are greater than one-million?

I will present 2 different solution strategies each with 2 different solutions, so lets jump right into it.

## Brute Force solution

The first brute force solution we can approach is pretty simple. We already made a small factorial function, so we can just make two loops and brute force the solution. The only thing is that we need to use the BigInteger class.

The C# code looks like this

int result = 0; const int limit = 1000000; const int nlimit = 100; for (int n = 1; n <= nlimit; n++) { for (int r = 0; r <= n; r++) { if(factorial(n) / (factorial(r)*factorial(n-r)) >= limit) result++; } }

with the factorial function looking like

private BigInteger factorial(int x) { BigInteger y = 1; for (int i = 2; i <= x; i++) { y *= i; } return y; }

It is a really simple solution the only problem is the execution time

Number of values of ncr above 1000000 is 4075 Solution took 116 ms

it takes a total of 116ms for me to solve. And that is less than satisfying for such a simple question. So lets try to do it a bit better.

## Caching the factorials

One thing we can notice from the source code is that we call the factorial function a whole lot of times and each time it is with an argument between 1 and 100, so why not cache the computations and then just read them from the cache during our computation.

Then we could implement the main loop as

int result = 0; const int limit = 1000000; const int nlimit = 100; BigInteger[] fac = factorials(nlimit); for (int n = 1; n <= nlimit; n++) { for (int r = 1; r < n; r++) { if (fac[n] / (fac[r] * fac[n-r]) >= limit) result++; } }

and the factorials function as

private BigInteger[] factorials(int x) { if (x < 0) return null; BigInteger[] y = new BigInteger[x+1]; y[0] = 1; for (int i = 1; i <= x; i++) { y[i] = y[i-1]*i; } return y; }

where the factorials function exploits that we 99! is just 98!*99 and so on.

This speeds up the result to a more satisfactory

Number of values of ncr above 1000000 is 4075 Solution took 5 ms

## A combinatorial approach

Some of you might (not surprisingly, since it is mentioned in the problem description) have noticed that we are dealing with combinatorics, and the formula we are given is the factorial representation of the binomial coefficient.

The binomial coefficient can be arranged as the pascal triangle such that

11 11 2 11 3 3 11 4 6 4 1

Calculating the Pascal triangle means taking the previous the numbers in the previous row just above the number you are trying to calculate and adding them. So we can use this to calculate the binomial coefficient as well, since n is the row number and r is the placement in the row.

One thing we can note is that the numbers keep growing, so if we find a number greater than our limit of 1.000.000, then we don’t have to calculate the exact size, we can just put 1.000.000 in the place. We can still find our result. This means that we can avoid using the BigInteger class, which should give us a bit of a speed up.

It can be implemented in the following way

int result = 0; const int limit = 1000000; const int nlimit = 100; int[,] pascalTriangle = new int[nlimit+1, nlimit+1]; for (int n = 0; n <= nlimit; n++) { pascalTriangle[n, 0] = 1; } for (int n = 1; n <= nlimit; n++) { for (int r = 1; r <= n; r++) { pascalTriangle[n, r] = pascalTriangle[n - 1, r] + pascalTriangle[n - 1, r - 1]; if (pascalTriangle[n, r] > limit) { pascalTriangle[n, r] = limit; result++; } } }

In the code we are representing the pascal triangle a in a two dimensional array such that we get the structure

11 11 2 11 3 3 11 4 6 4 1

The first loop is to initialise the pascal triangle since the left border requires us to take index -1 which does not exist, so we fill in the leftmost ones. The main loop takes care of the rest. We don’t really lower the number of computations, but we avoid BigInteger and the result is

Number of values of ncr above 1000000 is 4075 Solution took 0 ms

So now we can’t really speed it up in a measurable way any more, but let me just do a few more tricks.

## Exploiting symmetry

The pascal triangle is symmetric so we only have to calculate half of it to get the result, this should give us a decent speed up. Furthermore once we hit a number greater than our limit at index r we the numbers keep growing until we reach the middle of the triangle and then it decreases again and we will hit our limit when we reach n-r since the line is symmetrical. That means once we reach our limit we have found (n-r) -r +1 entries that fulfils the requirement and thus we don’t have to calculate the rest of the line.

Using these two properties we can make a C# implementation looking like

int result = 0; const int limit = 1000000; const int nlimit = 100; int[,] pascalTriangle = new int[nlimit + 1, nlimit/2 + 1]; for (int n = 0; n <= nlimit; n++) { pascalTriangle[n, 0] = 1; } for (int n = 1; n <= nlimit; n++) { for (int r = 1; r <= n/2; r++) { pascalTriangle[n, r] = pascalTriangle[n - 1, r] + pascalTriangle[n - 1, r - 1]; if (r == n / 2 && n % 2 == 0) pascalTriangle[n, r] += pascalTriangle[n - 1, r - 1]; if (pascalTriangle[n, r] > limit) { pascalTriangle[n, r] = limit; result += n - 2 * r + 1; break; } } }

The biggest chance in the code is the if clause in line 14-15 which is needed since for when n is even we get a border problem since we need to add something from the uncalculated part of the triangle.

The result of running the code is the same, but if you increase the limit of n you will see significant improvements.

## Wrapping up

I have shown you two different approaches each with two different flavours. The brute force approach, which is heavy on the computational size but doesn’t really take up much memory. A trade off can be made by caching the factorials and get a significant speed up for a small increase in memory usage. The alternative approach is to represent the problem as Pascal’s triangle, which is much faster but heavy on the memory side.

You are welcome to peek at the source code if you like.

Have you found other ways to solve it, or found improvements to the approach I am taking, please let me know through the comments.

The photo used in this post was taken by Steve Ryan who shared it under the creative commons license. So is my alteration of the photo.

My first approach was brute force, using BigInteger, and factorial caching.

Later I improved it by using the Pascal Triangle.

I also tried brute force using only logarithms, which also finished in 0ms.

To improve further on my last post:

can be replaced with:

because of symmetry.

Was the logarithm approach precise enough? You are suddenly dealing with an approximation rather than the correct numbers, since you only have finite precision in the doubles.

Interesting approach to avoiding BigInteger though. I wonder if you could have done the same just using doubles instead of integers.

I was also worried about double not working for this problem, but I got the correct answer.

I’ve been having fun with a BigDecimal class (https://github.com/SuprDewd/SharpBag/blob/master/SharpBag/Math/BigDecimal.cs), where I use the .Net BigInteger class for the mantissa and a 32-bit integer for the exponent, but unfortunately it did not work for this problem. It’s got weird precision errors. Different precision sizes yield different results. It’s still a work in progress, but would have been fun to get the correct answer though.

I think I see one of the problems.

I assume you make a definition of your mantissa to have the decimals starting right after your first digit. However, the computer treats them as integers, so if you add two numbers one with precision of 10 decimals and the other with a precision of 20 decimals, you need to chop off the end of the mantissa of the number with the longer precision, to get the decimals to be aligned.

You might already do that, but that could be an explanation. Interesting project though 🙂

All arithmetic seems to work perfectly.

and the result:

which is correct according to Mathematica.

I use slower digit extraction algorithms rather than faster approximation algorithms, so the calculations should be precise.

returns:

And as you know, 0.999… == 1 but I don’t have enough precision to represent infinite 9’s. So instead I’m going to need to round the last digit. And that might be the problem that’s causing it to fail on this problem.

You just mentioned you had problems with different precision sizes and round off errors. So I thought that might have been the problem, but luckily not 🙂

I managed to fix the bug, and now I get the correct answer. 😀

It was a problem with the comparing.

There’s another way of proceeding across a row. If you know the values are based on three factorials, as you co from nCr to nCr+1, one factorial on the bottom will have one more term, r+1, and the other will have one less term, n-r. Hence your can start from pascalTriangle[n, 0] = 1, and repeatedly do pascalTriangle[n, r+1] = pascalTriangle[n, r] * (n-r)/(r+1)

For example, You can generate the pattern 1,4,6,4,1 by starting from 1 and multiplying by 4/1, then 3/2, then 2/3, then 1/4

You can economise, as stated, by symmetry. But actually symmetrical table values are irrelevant: you can stop counting on each row as soon as you exceed the limit, because the numbers will only increase towards the centre of the triangle. Given we don’t care about the actual values, and each row’s calculation is independent of the the rows above or below now, we can count inwards from the edge of the triangle until either we hit the centre or we get a number > limit. Whereupon we double our row count, for symmetry. Unless we’re in the middle of an even row: subtract 1. At that point we know exactly how many in that row must be greater than the limit, and we can skip to the next row …

Actually, you can throw away the PascalTriangle structure, we just need a running count starting at 0 on the top row, and a running calculation, starting at 1 when we start a new row. Repeat count++ and calc = calc * (n-r)/(r-1) until r>= n/2 or calc > limit. count *=2 (maybe less one for the central case or over-counting when we hit the limit)

Then to the next row, with calc = 1 again …

Hi Derek

Interesting approach to optimize the solution. I think I see where you are going, but if you can provide the code for it as well, I think it will enlighten a whole lot of people 🙂

/Kristian

This method of calculation avoids the need for an array for pascal’s triangle, nor does it require bigints in order to calculate the triang;e’s coefficients, because it involves successive multiplications and divisions all of which have integer answers which remain ‘under control’.

To take one row as an example:

1 *7/1 = 7 *6/2 = 21 *5/3 = 35 *4/4 = 35 *3/5 = 21 *2/6 = 7 *1/7 = 1

i.e. 1 7 21 35 35 21 7 1

Suppose we’re looking for numbers over 20; we can stop when we find 21 i.e. r=2. n-r=5. Therefore numbers 2..5 are over 20. 5-2+1=4. There are 4 numbers over 20. Time for next row. i.e. this combines a multiplication trick with the previous ‘stop as soon as you can’ one. But is does not require a data structure to be created and filled: we’re not really interested in what any of the values are, only whether they’re over-limit.

On each row, we either stop because we’ve hit the middle or because we’ve found a big number.

Here’s the code, in the same style as earlier solutions …

It is also possible to update the current line in place, avoiding the O(N²) space complexity. The trick is that you have to process even and odd iterations differently. Here goes the code (C++) :

There is still some potential for optimisation, but I feel this is good enough to move on.

Right, so if we remove the memory complexity as well, I do believe you are right. We are done then 🙂

Nice trick though.

thank U very much for making this site!

I had a problem in pascal solution but now it’s solved! 🙂

regards!

I’m doing this in Python and I’m having trouble getting the correct answer. I seem to be two numbers off and I can’t for the life of me figure out why this is. My code is below if you’d like to take a look.

`# Integer -> Integer`

# Find the factorial of this number

def fact(n):

result = 1

for x in range(2,n+1):

result *= x

return result

# Integer Integer -> Integer

# Calculate n choose r

def choose(n,r):

return fact(n) / (fact(r)*fact(n-r))

# Integer -> Integer

# Find the number of instances of n choose r which exceed 1 million

def chooseMillions(n):

count = 0

for r in range(0,n+1):

if choose(n,r) > 1000000:

count += 1

return count

# Integer Integer -> Integer

# Count the number of instances of n choose r which exceed 1 million for n in the given range

def chooseAllMillions(lo,hi):

result = 0

for x in range(lo,hi):

result += chooseMillions(x)

return result

`print chooseAllMillions(1,101)`

The Pascal triangle only needs to be updated up to the last coefficient that does not exceed the threshold. This simplifies the code quite a bit.

With `hpt[]` (‘half a Pascal triangle’) initialised to the left 11 terms of the 23rd row of a Pascal triangle, the code becomes:

int k = 10, count = (23 + 1) – 2 * k;

for (int n = 24; n 0; –j)

hpt[j] += hpt[j – 1];

while (hpt[k – 1] > THRESHOLD)

–k;

count += (n + 1) – 2 * k;

}

Console.WriteLine(count);

Thanks for this marvellous series of articles on the Euler problems – keep them coming!

What if , if we do binary search for n at fix value of r.