Project Euler 103: Investigating sets with a special subset sum property.

Problem 103 of Project Euler is a hard problem if we look at the amount of people who solved it.  And I tend to agree, it took me a while to solve it. The problem reads

Let S(A) represent the sum of elements in set A of size n. We shall call it a special sum set if for any two non-empty disjoint subsets, B and C, the following properties are true:

1. S(B) ≠ S(C); that is, sums of subsets cannot be equal.
2. If B contains more elements than C then S(B) > S(C).

If S(A) is minimised for a given n, we shall call it an optimum special sum set. The first five optimum special sum sets are given below.

n = 1: {1}
n = 2: {1, 2}
n = 3: {2, 3, 4}
n = 4: {3, 5, 6, 7}
n = 5: {6, 9, 11, 12, 13}

It seems that for a given optimum set, A = {a1a2, … , an}, the next optimum set is of the form B = {ba1+ba2+b, … ,an+b}, where is the “middle” element on the previous row.

By applying this “rule” we would expect the optimum set for n = 6 to be A = {11, 17, 20, 22, 23, 24}, with S(A) = 117. However, this is not the optimum set, as we have merely applied an algorithm to provide a near optimum set. The optimum set for n = 6 is A = {11, 18, 19, 20, 22, 25}, with S(A) = 115 and corresponding set string: 111819202225.

Given that A is an optimum special sum set for n = 7, find its set string.

Before starting on deriving a solution I would like to say that I made a solution using brute force, a really ugly solution. So when I saw another solution I borrowed the majority of it and made some modifications to it in order to speed it up a bit. So if the coding style is not exactly the same as I usually present this is the probably the reason. But lets get started

Solution strategy

We are given a method that will give us near optimum sets. So I have made the assumption that the actual solution is in the vicinity of that solution. In our case we can use the algorithm and obtain A = {20, 31, 38, 39, 40, 42, 45}. So we will look for a solution within +/- 3 of each element. This we will do by brute forcing our way through all the possibilities.

Let us call the so far best set for A’. For each generated set B we need to test three things. First of all we need to test if the sum of B < A’ otherwise we might as well skip it.

The other two things we need to test are whether or not the proposed set has the described properties 1 and 2.

Testing property 1

The way we do this is to generate all subsets of the set B and then compare them to see if two disjoint sets have the same sum. Since we are looking at a set of 7 numbers there are 2^7 = 128 subsets. Since in each set a number can either be present or not be present.  In order to generate these subsets we will use a method of using the bits in the counter in order to check if a given element of the set is also a member of the subset. This is equivalent to the method we used in Problem 18 for the bruteforce solution.

This is implemented as

```private int[] MakeSubsetSums(int[] a) {
int[] b = new int[(int)Math.Pow(2, a.Length)];
for (int i = 1; i < b.Length; i++) {
b[i] = MakeSubsetSum(a, i);
}
return b;
}

private int MakeSubsetSum(int[] a, int m) {
int sum = 0;
for (int i = 0; i < a.Length; i++) { if ((m & 1) == 1) { sum += a[i]; } m >>= 1;
}
return sum;
}
```

As always when using bit shift operations it can be a bit difficult to read. But basically we just check whether each number should be in the given set by checking the binary representation of the index.

Once we have all the sets, we can verify Rule 1 by going checking if there are disjoint sets with the same sum.

We do this with the following method

```private bool VerifyRule1(int[] a, int dim) {
for (int i = 0; i < a.Length; i++) { int k = i; while (k != -1) { k++; if (k >= a.Length)
break;
k = Array.IndexOf(a, a[i], k);
if (k != -1 && Disjoint(i, k))
return false;
}
}
return true;
}
```

Where the Disjoint method is a very simple method

```private bool Disjoint(int n, int m) {
//Checks if the subsets are disjoint
return (n & m) == 0;
}
```

As you can see going verifying rule 1 is rather costly since we need to generate all subsets and using IndexOf operation rather often. However, it is the best I can come up with.

Verifying Rule 2

Verifying rule 2 is a lot less costly as it can be done in a clever way. If we sort the entries of the set such that a1a2< … < an we can check that the property holds by checking that

a1 +a2 > an and a1 + a2 + a3> an-1an and so on. For n=7 this requires 3 checks which can be implemented as

```private bool VerifyRule2(int[] a) {
//Assume that the array is sorted
int sum1 = a[0];
int sum2 = 0;

for (int i = 0; i < a.Length / 2; i++) {
sum1 += a[i + 1];
sum2 += a[a.Length - i - 1];

if (sum1 <= sum2)
return false;
}
return true;
}
```

This is unlike rule 1 a rather inexpensive operation.

Putting it all together

Now that we can verify each rule let us put the main loop together.  This is implemented as

```// base vector
int[] a = new int[] { 20, 31, 38, 39, 40, 42, 45 };
int min = -3;
int max = 3;
string result = "";

int[] t = new int[a.Length];
int dim = a.Length;
int[] s; // sum vector
int[] c = new int[7]; // pertubation vector
for (int i = 0; i < c.Length; i++) {
c[i] = min;
}

int tsum = 0;
int tmin = int.MaxValue;

for (int i = 0; i < (int)Math.Pow(max - min + 1, dim); i++) {
for (int j = 0; j < a.Length; j++) {
t[j] = a[j] + c[j];
}
tsum = TotSum(t);

if (tsum <= tmin) {
Array.Sort(t);
if (VerifyRule2(t)) {
s = MakeSubsetSums(t);
if (VerifyRule1(s, dim)) {
tmin = tsum;
result = PrintVect(t);
}
}
}

c = UpdatePertubationVector(c, min, max);
}
```

Where we start by generating a candidate based on the initial case and a pertubation vector. Then we check if the total sum is smaller than the current best solution. This is done with a helper method you can find in the source code. It is just summing the array.

Then we verify rule 2 and then rule 1 if the first holds. If we have found a new best we then mark that. At last we update the perturbation vector and searches the search goes on.

Wrapping up

This problem contained a lot of helper functions. Both because the problem has a lot of things that needs to be checked but also because I borrowed the structure from another solution. Executing it gives us the following result

```The optimum sss string is 20313839404245
Solution took 1658,5656 ms
```

This solution is not really fast, and more interestingly you can find the exact solution by the approximate algorithm… Well who would have thought?

I have skipped several trivial function which you can find in the attached source code. How did you solve it? I know that the problem is very related to Problem 105 and 106 and it is quite possible that I will get some insights from that. But at the time of writing I still haven’t solved those.

Posted by Kristian

anon

You could also use the near optimum set rule provided on the answer for n=6.

Kristian

Yep, that was my conclusion to the problem as well. However, I thought that wouldn’t be the answer.

Tom Leslie

The following does not consider “efficient” ways to generate candidate 7-element sets – I’m still working on that: it only really covers the most efficient(?) way to verify rule 1:

********************************************************************

If one first verifies rule 2 that S(A)>S(B) if the number of elements of A is greater than the number of elements of B, then this restricts the checks which have to be performed to verify rule 1.

Consider some random subset X with x elements, then

1) If rule 2 has been verified it must be true that S(X)>S(A) where A has fewer than x elements

2) If rule 2 has been verified it must be true that S(X)=4 because with a 7-element starting set, there can be no subsets with 4 elements which are disjoint from a candidate 4 element subset.

6) Net result of the above argument is that one only has to consider whether rule 1 applies for x=2 and do the same for x=3.

Somewhat embarrassed to admit that I came up with the above before I came up with the fast method (as yours) to check rule 2!

The above argument can be extended for problem 105 (haven’t done this yet): eg for a 12-element set one only has to consider rule 1 for subsets with up to 6 elements

rgds
Tom

Tom Leslie

Hmmm something glitched and bullets 3-5 got lost – see below

3) The only case not covered is those disjoint sets which have the *same* number of elements as the candidate X.

4) One doesn’t really have to consider the case of x=1, because if the starting set actually has 7 *distinct* elements then rule 1 must be obeyed.

5) It is also unnecessary to consider the case where x>=4 because with a 7-element starting set, there can be no subsets with 4 elements which are disjoint from a candidate 4 element subset.

rgds
Tom

Filipp Riabchun (@hypnos_phi)

When checking the rule 2 you can compare only the pair of the biggest subsets (first 4 vs last 3 in the 7-element case). All the other comparisons can be derived from this one as we basically subtract some lower value from the “bottom” part (which is proven to be greater at that point) and some higher value from the “top” part. Obviously, the inequality will hold in that case.

Carl J Appellof

If you used the rule on the “near optimum” set for n=6, your limits of -3/+3 would not have reached the true optimum. A limit of -4 is needed. I’m pretty uncomfortable with this method of searching, since if you don’t start near the optimum, you’ll never find it.
Also, you’ll note that your method of perturbing the input set often produces sets with non-distinct values.

I haven’t come up with with a better method of searching, though.