Project Euler 106: Find the minimum number of comparisons needed to identify special sum sets.

Unlike Problem 105 which I wasn’t too impressed with Problem 106 of Project Euler has really lead me to gain some insights into the special sum sets. The problem reads

Let S(A) represent the sum of elements in set A of size n. We shall call it a special sum set if for any two non-empty disjoint subsets, B and C, the following properties are true:

  1. S(B) ≠ S(C); that is, sums of subsets cannot be equal.
  2. If B contains more elements than C then S(B) > S(C).

For this problem we shall assume that a given set contains n strictly increasing elements and it already satisfies the second rule.

Surprisingly, out of the 25 possible subset pairs that can be obtained from a set for which n = 4, only 1 of these pairs need to be tested for equality (first rule). Similarly, when n = 7, only 70 out of the 966 subset pairs need to be tested.

For n = 12, how many of the 261625 subset pairs that can be obtained need to be tested for equality?

Let me warn you upfront that this will be a rather long post, as it will detail many of the insights I got while fiddling around with this problem in order to understand it. If this hasn’t scared you away I will suggest that you go pick up a cup of tea or coffee and settle down for the ride.

Counting the number of comparisons

When I started out, I didn’t even know how they found the 25 sets for n=4, so that was the first thing I set out to figure out. This was done rather quickly though. We are looking for pairs of subsets of the set n=4.  This is rather simple combinatorics problem. The question is how many different ways can we make two subsets out of the set {1,2,3,4}

We can have two subsets of 1. We can make a subset of 1 and one subset of two and so on. All in all we can make the combination sizes

1-1, 1-2, 1-3, 2-2 and 3-1

We have 4 choose 1 * 3 choose 1 = 12 combinations of the first kind. However, we don’t really care if we pick the first subset as {1} and second as {2} or the other way around. Since we have all combinations twice we need to divide this number with 2.

This is also the case for the 2-2 combinations. And the 3-1 combinations will yield the exact same as the 1-3 combinations. This gives us a total of

1-1: (4 choose 1 * 3 choose 1) / 2 = 6
1-2: (4 choose 1 * 3 choose 2)  = 12
1-3: (4 choose 1 * 3 choose 3) = 4
2-2: (4 choose 2 * 2 choose 2) /2 = 3

Or a total of 25 possibilities.

The same goes for n=7 where we have

1-1: (7 choose 1 * 6 choose 1) / 2 = 21
1-2: (7 choose 1 * 6 choose 2)  = 105
1-3: (7 choose 1 * 6 choose 3) = 140
1-4: (7 choose 1 * 6 choose 4) = 105
1-5: (7 choose 1 * 6 choose 5) = 42
1-6: (7 choose 1 * 6 choose 6) = 7
2-2: (7 choose 2 * 5 choose 2) /2 = 105
2-3: (7 choose 2 * 5 choose 3) = 210
2-4: (7 choose 2 * 5 choose 4) = 105
2-5: (7 choose 2 * 5 choose 5) = 21
3-3: (7 choose 3 * 4 choose 3)/2 = 70
3-4: (7 choose 3 * 4 choose 4) = 35

Which gives a total of 966 which it should. Okay, so this we have nailed down now.

Weeding out the majority of combinations

I will go back to the problem with n=4 to gives some insight into what sets we don’t have to check.

The first group of subset pairs I would is the 1-1 pairs. Since we know that the elements of the original set are strictly increasing, the two subset pairs can never be equal. That rules out the first 6 sets, only 18 to go.

The second general rule we can use, is that we know the set satisfies rule 2. So any subset pair which contains different number of elements cannot be equal, since that would conflict with rule 2. That rules out the subset pairs 1-2 and 1-3. Which leaves us with the 3 subset pairs in the group 2-2.

Investigating the pairs of same size

For the n=4 case there are a total of 6 combinations. But 3 of those are mirror versions, so we need to check the 3 of them to figure out the solution we have the following

{1,2} {3,4}
{1,3} {2,4}
{1,4} {2,3}

In the first combination we can see that each element in the first subset can be paired up with an element of the second subset which is larger. Such that we have 1 and 3, 2 and 4. The same goes for the second combination where we can pair up 1 and 2, as well as 3 and 4. Since each element in the first subset is smaller than a corresponding element in the second subset the total sum of the first set must be smaller than the sum of the second set. However the last combination we can’t pair the sets up like that and therefore we can’t say anything about it and thus we must check it.

This gives us the general idea of what we can weed out. However, we still have no idea how to expand this. We need to expand it in two directions. One direction is what happens to the 2-2 subsets in the case of n=5,N=6… The second direction is what happens to the 3-3 subsets and so on.

Increasing n

Let us treat the first case where we increase n. I did this brute force by writing all combinations in a text file. You can download the files here if you like. They are not very interesting, but it did help finding a pattern.

The first and obvious thing is that there are 5 choose 2 * 3 choose 2 = 30 combinations for n = 5. Half of those are mirror versions, so we can discard them. However the interesting observation is that we can choose a set of 4 in 5 choose 4 = 5 ways. The choices are


For each of these 5 ways we can group the 4 chosen elements into 2 sets in the 3 different ways described above (it is 6 but you know only half of them are interesting due to mirroring) .

For the second of the sets above this will be

{1,2} {3,5}
{1,3} {2,5}
{1,5} {2,3}

exactly as in the case of n= 4. This means that there is only 1 out of those 3 is interesting.

I confirmed this with the file for n=6. Which means that for an arbitrary n, and for the 2-2 subset we need to check

However, why we only need to check 1 out of three is still a mystery, so let us go on trying to expand this to the 3-3 subsets.

Increasing subset size

I did this investigation with the 3-3 subset. You can download the files in the same zip file as above. First of all we have

Of that only 10 of these will be interesting and in exactly 5 of those we can pair each element in the second subset with a smaller element in the first subset. So the magic number here seems to be 5 for the 3-3 subsets.

Somewhere into my analysis I realized that this had ties to Problem 15 where we had to find the  number of ways we could go through a grid if we only went down and right. Writing all these A’s and B’s in the files was how I got the link. Only that we are only interested in paths where we stay in the lower triangle of the grid, meaning that the number of rights are never larger than the number of downs at any given point in the sequence.

I knew someone must have solved this problem before me, so I started to search for this combinatorics problem of the grid. And lo and behold Google turned up Catalan numbers which is exactly the sequence I was looking for. In the wiki article the problem I described earlier is mentioned as monotonic paths.

The formula for these Catalan numbers is given as

The final formula for subsets of the same size

So now we know how to find the number of subsets we need to investigate x(n,s) both when we increase the size of the set (n) and when we increase the subset size (s).  This can be  written as

where is given as

For this specific problem we only need to do the calculations for n = 12 and s = 2, 3,4,5 and 6.

Implementing the solution

Now that we have the math nailed all we need is to do is to implement it. My implementation of Catalan numbers looks like

private int CatalanNumber(int n) {
    return Choose(2 * n, n) / (n + 1);

where the choose function is used before, but can be seen in the source code.

The main function looks like

int result = 0;
int n = 12;

for (int i = 2; i <= n / 2; i++) {
    result += Choose(n, i) * Choose(n - i, i) / 2;
    result -= Choose(n, 2 * i) * CatalanNumber(i);

and it all runs in

The number of subsets needed to check 21384
Solution took 0,1093 ms

Actually I think this could be done by hand within the given time limit of 1 minute once we found the formulas.

Once again the source code can be found here. And not least, let me know how you solved the problem.

Posted by Kristian


Bjarki Ágúst

Just amazing! Well done. You should definitely be proud of this one.

I bruteforced this one. Went through all [latex]3^{12}[/latex] subset pairs and incremented a counter if it needed to be tested.

int n = 12,
	cnt = 0;

for (int i = 1, il = pow(3, n); i < il; i++)
	int lc = 0,
		rc = 0,
		lh = 0,
		rh = 0,
		t, j;

	t = i; j = 1;
	while (t)
		int cur = t % 3;
		if (cur == 1) lc++, lh = j;
		else if (cur == 2) rc++, rh = j;
		t /= 3; j++;

	if (lc == 0 || rc == 0) continue;
	if (lh > rh) continue;
	if (lc != rc) continue;

	vector<int> l, r;

	t = i; j = 1;
	while (t)
		if (t % 3 == 1) l.push_back(j);
		else if (t % 3 == 2) r.push_back(j);
		t /= 3; j++;

	bool ok = false;
	for (int k = 0; k < lc; k++)
		if (l[k] > r[k])
			ok = true;

	if (!ok) continue;

printf("%d\n", cnt);

It runs in 150ms, but I might be able to lower it a little bit with a small optimization I have in mind.

But out of curiosity, I told Mathemetica to simplify
[latex]\displaystyle \sum_{i=2}^{n/2}\binom{n}{i}\binom{n-i}{i}/2-\binom{n}{2i}\binom{2i}{i}/(i+1)[/latex]
and it returned
[latex]\displaystyle \frac{1}{2} \left(\, _2F_1\left(\frac{1}{2}-\frac{n}{2},-\frac{n}{2};1;4\right)-2 \, _2F_1\left(\frac{1}{2}-\frac{n}{2},-\frac{n}{2};2;4\right)+1\right)[/latex]
where [latex]_2F_1[/latex] is the Hypergeometric function. From what I’ve read in this paper, pg. 29, your original code is probably much simpler than if you were to compute this formula. Just found it interesting, so I thought I’d share.

Thanks you. Yes it took some time to get the thoughts right about this, but I am really happy with the result.

That being said, I think it is fun to see a brute force approach as well, since I didn’t really know how to iterate through all the combinations. I don’t know what to say to the last part. But I would never have come up with that on my own, and I would never be able to actually read it again if I had coded it like that. But fun to see and good to get some new math to look into.

You have a typo in the formula for x(n,s). It should read “…-C_s \binom{n}{2s}” instead of “…-C_s \binom{n}{s}”. The code for this part however is correct.

Thanks for noticing. It is corrected now.


I think I found another small typo. Under the “increasing n” section the formula for x seems to have the first half post-multiplied by an extra 2.

And an interesting thing I noticed is that if you change the C(n,s) in the final formula to C(n,n-s) then both parts have a C(a,b)*C(b,c) term in them. It makes some factorials cancel out, but I’m not sure that it means anything.

Hi Dav

That seems indeed to be a bug, and it is now fixed.

The second part you are right about, but I can’t see any obvious ways to exploit it in a useful way.

I approached the choosing of sets in a different way: I started by making one set of size 2s. Sets B = {A1, A2, A3, A4} or B = {A5, A7, A11, A12} are completely equivalent for the purpose of this problem, so I wanted to remove n from the problem as early as possible.

So where you have (n,s) * (n-s, s), I have (n, 2s) * (2s, s).

At this point I did a lot of pen-and-paper calculations and eventually came up with a recursive summation that was faster to code than to simplify further. But the formula you found for Catalan numbers looks very much like my version of the count of total subsets.

Seeing that those expressions are equivalent, things start factoring out of your “final” formula.
1/2 (n,s)(n-s, s) – 1/(s+1) (2s,s) (n, 2s)
= 1/2 (n, 2s)(2s, s) – 1/(s+1) (2s,s)(n, 2s)
= (n, 2s)(2s, s) (1/2 – 1/(s+1))

A pretty simple approach, relating to Dyck words:

From Wikipedia, a Dyck word is a word of length 2n consisting of n X’s and n Y’s such that no initial segment of the string has more Y’s than X’s. E.g. XXXYYY, XYXXYY, XYXYXY, XXYYXY, XXYXYY

If we have an ordered set of unique integers, the Dyck words represent the situations which we do NOT need to check the subset sums (X indicates a number in set #1, Y a number in set #2). Not surprisingly, the number of Dyck words of length 2n is equal to Catalan(n).

The total number of words of length 2n with equal number of X’s and Y’s is Choose(2n, n)/2, so the number of non-Dyck words is:
Choose(2n, n)/2 – Catalan(n)

Finally, given a set of N integers, there are Choose(N, 2n) ways to select a set of 2n unique integers. We need to consider n up to floor(N/2). This leaves:
# checks = sum(n=1..N/2, Choose(N,2n)*[Choose(2n,n)/2-Catalan(n)] )

Storing a Pascal’s triangle up to size N for computing the combinations makes this computation near instantaneous.

To find the number of pairs of subsets, there’s a pretty easy formula using the principle of inclusion exclusion. On top of the 2 subsets, look at the remaining unused numbers. These numbers form a third subset. Therefore, each number can be in subset 1 (s1), subset 2 (s2), or the unused subset(s3). For a set of size n, there are 3^n ways to divide these numbers between the subsets.

However, these include cases where s1 or s2 are empty. For the case where s1 is empty, there are 2 remaining subsets for the numbers to fall into, or 2^n combinations. Same goes for s2.

Finally since we double counted the case where both s1 and s2 are empty, we subtracted out 1 extra. So we add 1 back. And since s1 and s2 are indistinguishable, we divide the whole thing by 2. So our formula is (3^n-2^n-2^n+1)/2, or (3^n+1)/2-2^n.

Work around for hackerrank? Most of the test cases have large inputs and fails to satisfy time constraint. Thanks

Leave a Reply