 # Project Euler 72: How many elements would be contained in the set of reduced proper fractions for d ≤ 1,000,000

I don’t think it is a coincidence that this exact problem comes right now as we shall see in the solution of the problem. Problem 72 of Project Euler reads

Consider the fraction, n/d, where n and d are positive integers. If n<d and HCF(n,d)=1, it is called a reduced proper fraction.

If we list the set of reduced proper fractions for d ≤ 8 in ascending order of size, we get:

1/8, 1/7, 1/6, 1/5, 1/4, 2/7, 1/3, 3/8, 2/5, 3/7, 1/2, 4/7, 3/5, 5/8, 2/3, 5/7, 3/4, 4/5, 5/6, 6/7, 7/8

It can be seen that there are 21 elements in this set.

How many elements would be contained in the set of reduced proper fractions for d ≤ 1,000,000?

The first thing to note is that in here HCF is the same as the greatest common divisor or GCD as I usually use. The second thing to note is that we are probably looking for a very large number, but let us take a look at the problem.

## Restating the problem

Let us start with a smaller problem. How do we determine the number of reduced proper fractions of one number i. Well a proper reduced fraction is a fraction where the numerator and denominator has a greatest common divisor of 1, meaning they are relatively prime. So all numerators which are relatively prime to and smaller than i  are proper reduced fractions. This is the exact definition of Euler’s totient function, so the answer to that question is φ(i).

This means we can restate the original question to find the answer to

## Solving this with a sieve

Euler’s totient function of a given i can be calculated as

Where p|i means all the prime factors p of i. It means we have to find all distinct prime factors for all numbers up to 1.000.000. Quite a task unless we do something a bit clever.  Luckily we can use a clever method. We can use a sieving method just like the sieve of Eratosthenes.

In a sieve of Eratosthenes we make a list of all numbers up to the limit we are interested in.  We then start at the first prime and remove any multiple of that since any multiple will be a composite number with the found prime as a factor. We then move on to the next number which is not removed and remove any multiple of that. We need to apply the same technique here. However, instead of removing multiples of the primes we need to multiply them with .   Doing that means we will eventually do that with all prime factors of every number.

The clever thing about this is that primes are easily identifiable since there are no smaller prime factors that divides a prime, once we come to the number it will still be the full quantity.

I have made the sieve implementation in C# as

int limit = 1000000;
int[] phi = Enumerable.Range(0, limit+1).ToArray();
long result = 0;
for(int i = 2; i <= limit; i++){ if (phi[i] == i) { for (int j = i; j <= limit; j += i) { phi[j] = phi[j] / i * (i - 1); } } result += phi[i]; } [/csharp]

## Wrapping up

My main problem with this solution is that the execution of it gives me

There are 303963552391 proper fractions with denominator up to 1000000
Solution took 34 ms


which is a longer solution time than I would like. However, I think the solution approach is really neat and thus I am satisfied. As usual you can find the source code for the problem here. Comments, questions or other solutions are as always very welcome. ### Posted by Kristian SuprDewd

This is exactly how the length of a Farey sequence is calculated, so I won’t have to post my code 🙂

Also, the code is bugged again. Kristian

The code should be fixed now. I have no clue why it does that once in a while. But thanks for letting me know.

I actually didn’t know that was the easiest way to calculate the length of a Farey sequence, but now I got a little bit more enlightened.

/Kristian Rory

Number-theoretically, one can use the formula (here, we are interested in the quantity |F_t|) just above Proposition 4 in http://arxiv.org/abs/0801.4966 . Chris

This method is really clever. I approached the problem in the same way (calculating the order of U(n) – the field of units in Z_n = phi(n)). However, your method of doing it by modifying the sieve is the leap that I didn’t make. Speeded up my solution 1000-fold. Kristian

Hi Chris

Thanks for the compliment, glad to hear that you liked the solution method. If you want, feel free to share your code. I would love to see other solutions.

/Kristian Philipp

It is a very clever solution and it took me much longer to understand it than I would have liked 😉 I had not done proper research so it hadn’t even occured to me that it was possible to calculate lots of phi-functions simultaniously; it is quite obvious once you think about it in the right way – thanks for the shove 🙂 Kristian

Hi Philipp

Thanks, I am just glad that you took something with you that you can hopefully use in other problems. I have revised the post a bit after your comment, to explain a few of the things which were obvious to me when I wrote the post, but I can see now that I should spend a few word explaining.

/Kristian Avidan

You got me confused, because there is actually a mistake in the formula describing the calculation of ϕ(i). It should be multiplied by i in the end, like here:

But once I got it, your method is really cool. I solved it using prime sieveing and calculating ϕ(i) separately, and your solution is really faster! 🙂 Kristian

We can’t have that can we. So I updated the post with your correction. Thank you. And glad you figured it out anyway. Khaur

I found this formula on the wikipedia page for the Farey sequence: $|F_n| = \frac{n+3)n}{2}-\sum_{d=2}^n|F_{\lfloor n/d\rfloor}|$

So I made a recursive function, caching the results, that’s only about 2000 values that need to be computed (1999 exactly, there seems to be a pattern). I also factorised the $|F_{\lfloor n/d\rfloor}|$ that occurred multiple times when computing a single value. Code goes as such:

long long fareySize(long long N){
static vector<long long> cache(N+1,0);
if(cache[N])
return cache[N];
else{
if(N==1){
cache=2;
return 2;
}
long long sum = (N+3)*N/2;
double sqN = sqrt(N);
// All j=N/i <sqrt(N) occur N/j-N/(j+1) times (at least once)
// (Mind the implicit truncating)
for(long long j = 1; j<sqN;j++){
sum-=(N/j-N/(j+1))*fareySize(j);
}
// Count the rest
for(long long i = 2; i<=sqN;i++){
sum-=fareySize(N/i);
}
// We might have counted a value twice (not pretty but it works)
if(floor(N/floor(sqN)) == ceil(sqN)-1)
sum+=fareySize(ceil(sqN)-1);
cache[N]=sum;
return sum;
}
}


Runs in 10ms(+-5ms) on my computer. A lot of the time is spent because of the recursion, so testing the cache before calling might speed it up a bit more, at the cost of obfuscating the code. If someone can find an iterative version that computes only what is needed, that would be even better.

I’ll keep that sieve adaptation method in mind, it might be handy at some point! Bjarki Ágúst

Hi Kaur.

Here is an iterative version. For whatever reason, it takes 10 seconds, and the answer is 2 off.

#include <iostream>
#include <cmath>
using namespace std;

int main()
{
int n = 1000000 + 1;
long long* arr = new long long[n];
arr = 2;

for (int i = 2; i <= n; i++)
{
arr[i] = (static_cast<long long>(i) + 3) * i / 2;
double sqN = sqrt(i);
for (int j = 1; j < sqN; j++)
{
arr[i] -= (i/j - i/(j+1)) * arr[j];
}

for (int j = 2; j <= sqN; j++)
{
arr[i] -= arr[i / j];
}

if (static_cast<int>(floor(i / floor(sqN))) == static_cast<int>(ceil(sqN) - 1))
{
arr[i] += arr[static_cast<int>(ceil(sqN)) - 1];
}
}

cout << arr[n-1] << endl;

return 0;
} Khaur

The answer is 2 off because the wikipedia article counts 0/0 and 1/1 in the sequence, whereas project Euler doesn’t. I believe the problem with that iterative version is that you compute all the values, when we’re not even going to use most of them (for example all the values from 500001 to 999999 are useless).

It seems there are $2\lfloor\sqrt{N}\rfloor-2$ values that are required, being able to enumerate them in ascending order should give an efficient iterative algorithm. Khaur

I just realised that we don’t really need an explicit way to enumerate the required values, since it’s not hard to compute the set:

long long fareySize(long long N){
double sqN = sqrt(N);
//Find out which values we have to compute
set<long long> left({N}); // Values to develop
set<long long> required;
while(!left.empty()){
long long m = *left.rbegin(); // Biggest element
left.erase(m);
for(long long i=2;i<m/(long long)sqN;i++){
left.insert(m/i);
required.insert(m/i);
}
}
//Insert the trivial values
for(long long i = 2;i<=sqN;i++)
required.insert(i);
required.insert(N);

vector<long long> fareySizes(N+1,0);
fareySizes=2;

//Compute the required values from the lowest to N
for(long long m:required){
double sqM = sqrt(m);
long long sum = (m+3)*m/2;
// N/i are all different for i<=sqrt(N)
for(long long i = 2; i<=sqM;i++){
sum-=fareySizes[m/i];
}
// Factorise the values that occur multiple times
for(long long j = 1; j<sqM;j++){
sum-=(m/j-m/(j+1))*fareySizes[j];
}
// We might have counted a value twice
if(floor(m/floor(sqM)) == ceil(sqM)-1)
sum+=fareySizes[ceil(sqM)-1];
fareySizes[m]=sum;
}
return fareySizes[N];
}


This breaks the time precision for me with N=1,000,000 (<5ms). Bjarki Ágúst

Nice. 🙂 Atilla Aliyev

🙁 I worked to find the method for a week.Finally, I found the method and I wrote code.But then,I examined your solution and it exactly same with mine,all.I imagined that find me only.Good solution.It’s perfect method,congratulations.