Project Euler 22: What is the total of all the name scores in the file of first names?

Written by on 12 January 2011

Topics: Project Euler

Project Euler has a large variety in the sort of problems they pose. Problem 22, as we will solve now, has nothing to do with mathematics and everything to do with computer science and sorting algorithms. The problem reads

Using names.txt (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 x 53 = 49714.

What is the total of all the name scores in the file?

The way I want to solve this problem, is to split it into three parts

  1. Read the input file and turn the data into a manageable data structure
  2. Sort the data
  3. Sum up and provide the answer
So this post will be split into three parts with most focus on the second part, as that is where the meat is. I will swap the sections a bit, such that I treat section 1 and 3 first to set up the framework. At last I will shortly treat two different sorting algorithms plus the .NET API way for sorting.

Reading the input

Reading a file for its content is a simple procedure which we have done before, but in order to get the data into a usable format, we need to make a bit of effort about the data structure in the file.

These are the notes I have made about the data structure in the file

  1. There are no line breaks
  2. All names are encapsulated in quotation marks – these needs to be stripped
  3. All names are separated by a comma.
  4. All names seems to be completely in upper case

With these facts/assumptions, it is fairly easy to write a small method that provides an output which is  a string array with just the names in all upper case. In C# this method can look as

private string[] readInput(string filename) {
    StreamReader r = new StreamReader(filename);
    string line = r.ReadToEnd();
    r.Close();

    string[] names = line.Split(',');

    for (int i = 0; i < names.Length; i++) {
        names[i] = names[i].Trim('"');
    }

    return names;
}

I read all of the data, split it around the comma and trim off quotation marks.

Summing Up the values

In this section there are two pieces of code, one piece for is the main method which calls the input reader, and sums up all the names. The other piece of code is the method for finding the value of the individual name. And lets start with the main method which is pretty straight forward

string filename = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + "\\names.txt";
string[] names = readInput(filename);
int sumResult = 0;

//Insert some sorting here

for (int i = 0; i < names.Length; i++) {
    sumResult += (i + 1) * sum(names[i]);
}

It is a really simple piece of code that does as described in the beginning of the post; read the input, sorts the names and sums them up. I have not inserted the sorting part yet, as the implementation is a bit different depending on the method, but we will get back to that later.

The summing function is really simple. I converts each character to the ASCII value, which for A is 65, B is 66 and so on. So by subtracting 64, you get the value asked for in the problem description. It can be implemented as

private int sum(string name) {
    int result = 0;
    for (int i = 0; i < name.Length; i++) {
        result += Convert.ToInt32(name[i]) - 64;
    }
    return result;
}

Sorting the data

With the framework in place we are now ready to deal with the matter of the subject. Before even starting this section I would like to point to Wikipedia’s section on sorting algorithms, since I find the description rather extensive and well written. The entry covers stability, memory usage and running time comparisons.

As mentioned earlier I will pick three strategies for the sorting. The first approach is to use the built in sorting function called Array.Sort(). However, one of the names in the list starts with AA, which at least on my machine is treated as the Danish/Norwegian letter Å, which comes in the end of the alphabet, and thus messes up the sorting. So we have to implement our own comparison function for this approach to work.  I will give you the source code I used to implement the comparison for our own sorting algorithms, but it is rather IComparer interface, and you can check the source code to see the specific details.

In C# I have implemented it as

private bool CompareStrings(string str1, string str2) {
    int i = 0;
    while (i < str1.Length && i < str2.Length) {
        if (str1[i] == str2[i]) {
            i++;
        } else {
            return str1[i] < str2[i];
        }
    }
    return str1.Length <= str2.Length;
}

The main method is implemented as

Array.Sort(names, new mySorter());

and the execution gives

The sum of all names are: 871198282
Solution took 8 ms

so it is a rather fast implementation.

Second I will implement a Bubble sort which is easy to understand and one of the first sorting algorithms I learned about. However, it is also very slow.

The last algorithm I will implement is a Merge Sort which is much faster, but uses more memory than the buble sort. One common property for the two algorithms is that they are stable, meaning that they will work even if the same key appears more times. I expect the .NET implementation to be a merge sort as well, since it is very used as a standard implementation in many languages

Buble Sort

The Bubble sort is an algorithm which steps through the list swapping items two items at a time until they are in the correct relative order. The algorithm continues to run through the set until it is completely sorted. It has a worst case running time of O(n2) which means it scales poorly.

My implementation looks like

private string[] BubleSort(string[] strings) {
    bool changed = true;

    while (changed) {
        changed = false;
        for (int i = 1; i < strings.Length; i++) {
            if (!CompareStrings(strings[i - 1], strings[i])) {
                string temp = strings[i - 1];
                strings[i - 1] = strings[i];
                strings[i] = temp;
                changed = true;
            }
        }
    }
    return strings;
}

and the implementation in the main method is

names = BubleSort(names);

The execution using buble sort gives us

The sum of all names are: 871198282
Solution took 699 ms

So a factor 90 slower than the .NET implementation. Lets see if we can improve it with a Merge Sort.

Merge Sort

Merge Sort is a stable sorting algorithm which works has a good worst case running time of O(n log n).

Conceptually, a merge sort works as follows (quote from wikipedia)

  1. If the list is of length 0 or 1, then it is already sorted. Otherwise:
  2. Divide the unsorted list into two sublists of about half the size.
  3. Sort each sublist recursively by re-applying merge sort.
  4. Merge the two sublists back into one sorted list.

My implementation in C# looks as

private string[] MergeSort(string[] strings) {
    if (strings.Length <= 1) {
        return strings;
    }

    int firstPart = strings.Length / 2;
    string[] strings1 = new string[firstPart];
    string[] strings2 = new string[strings.Length - firstPart];

    string[] sorted = new string[strings.Length];

    // Split the array into two;
    for (int i = 0; i < firstPart; i++) {
        strings1[i] = strings[i];
    }

    for (int i = firstPart; i < strings.Length; i++) {
        strings2[i - firstPart] = strings[i];
    }

    strings1 = MergeSort(strings1);
    strings2 = MergeSort(strings2);

    int j = 0;
    int k = 0;

    for (int i = 0; i < sorted.Length; i++) {
        if(j == strings1.Length){
            sorted[i] = strings2[k];
            k++;
        } else if(k == strings2.Length){
            sorted[i] = strings1[j];
            j++;
        }else if(CompareStrings(strings1[j], strings2[k])){
            sorted[i] = strings1[j];
            j++;
        } else {
            sorted[i] = strings2[k];
            k++;
        }
    }

    return sorted;
}

A bit longer piece of code compared to the bubble sort, but it should be easy enough to follow from the conceptual description. The execution looks like

The sum of all names are: 871198282
Solution took 8 ms

So we are able to create a sorting algorithm as fast as the implemented. However, we might as well just use the supplied functionality.

Closing remarks

This blog post treated the solution of Problem 22, of Project Euler, which is all about sorting a list of strings. I gave you three different methods to sort the list, and found that the fastest own implementation I could provide is as fast as the supplied .NET implementation. The only advantage to implement it yourself beside the educational aspect, is that you can customize it later on.

As usual the source code is available for you to study. And remarks, comments and questions are encouraged.

19 Comments For This Post I'd Love to Hear Yours!

  1. Chris says:

    var names = File.ReadAllText(“C:\\names.txt”).Split(‘,’).Select(x => x.Trim(‘”‘).Trim()).OrderBy(x => x);
    var result = names.Select((x, i) => x.ToCharArray().Sum(y => (int)y – 64) * (i + 1)).Sum();

  2. Kristian says:

    Hi Chris

    Thanks for your comment with another solution. It is a lot shorter using linq as you do.

    Maybe it is only on my system it poses a problem due to my language settings. However, when I run the code there is one small problems you need to address. One of the names contain AA, which is interpreted as the Norwegian/Danish letter Å, and thus the solution ends up being 2 off.

    I am not familiar enough with the linq syntax to be able to remedy this. But I am pretty sure you can write a a line of code to tell it how to sort things.

    /Kristian

  3. Kristian says:

    Hi again Chris

    Once again thank you for submitting your approach. I have been meaning to look into Linq for a while. I played around with your solution, and while I do think that it is more readable in some ways, it is certainly slower. Not so much that it matters. I was able to execute it in 38ms vs. 8ms on the fastest of the solutions I presented. I does show that the more abstract code (as I see Linq) also performs a bit slower since it uses lists instead of arrays and so on.

    That being said, I do really appreciate your solution, and I am inspired by it, so maybe one day you will see this type of solutions from me as well.

    /Kristian

  4. SuprDewd says:

    I’m pretty sure LINQ is neither using Lists nor arrays.
    Underneath, they’re using some kind of a state machine and somehow pipe elements through to the foreach loop.
    But, of course, it isn’t as fast as using arrays.

    I love LINQ and all it’s components. It *usually* makes code a lot more elegant and shorter.

  5. Kristian says:

    After looking a bit more, I think you might very well be right. It would be fun to learn a bit more about. Does anyone have some good sources both for an introduction to Linq and what happens behind the scenes when using linq?

    /Kristian

  6. SuprDewd says:

    Microsoft has a collection of 101 LINQ examples here: http://msdn.microsoft.com/en-us/vcsharp/aa336746
    They also have a set of introduction articles:
    http://msdn.microsoft.com/en-us/library/bb397933.aspx

    I’ve read these two books and can recommend them:
    http://www.amazon.com/LINQ-Objects-Using-4-0-Addison-Wesley/dp/0321637003/ref=sr_1_1?ie=UTF8&qid=1306500550&sr=8-1
    http://www.amazon.com/Essential-LINQ-Charlie-Calvert/dp/0321564162/ref=sr_1_1?ie=UTF8&qid=1306500611&sr=8-1

    I still think the best way to learn is to try things out, and use google if you run into issues.

  7. Jens says:

    I was working on this problem and hit the same curious case of AARON being the last element of a sorted list, seems there is an easy workaround however, when sorting, you can choose which kind of string comparer to use, but instead of rolling your own, you can chose one of the build-in ones that uses an invariant culture: Array.Sort(StringComparer.InvariantCulture), then it can do proper alphabetical sorting.

  8. Kristian says:

    Ah nice, I didn’t know you could do that to avoid the special case here with Danish and similar languages.

  9. Patrick says:

    Hey, this is an incredible resource…but I am learning c++ not c#, though I am aware that I will need to know both and am capable of converting from one to another but its quite touch and go because i never formerly attempted to learn c#..anyway, my point is, is there a resource as incredible as this that explains euler in c++?

  10. Kristian says:

    Hi Patrick

    I don’t think there is any resource as great as this… (okay, that was cheap). I haven’t found anything specifically for c++, but on the other hand I haven’t really looked.

    For the majority of the problems here, I think the big part is figuring out a method for solving it, after that the implementation in a specific language should be the lesser part of it. I might be wrong though, but that is the way it is for me.

    Also, I think that you should make a solution for the problem, and then afterwards maybe read the blog to see if there are other ways of solving the problem. Thus you learn c++ from your initial implementation.

  11. Bjarki Ágúst says:

    I think you should rather look for an introduction to C#, and then come back here when you’re a little more comfortable with it. In my opinion, C++ and C# aren’t really that different. The syntax is nearly the same, and the only thing that differs significantly are the libraries, but if there’s something you don’t recognize (for example, the StreamReader class), a simple Google search should be enough.

  12. Patrick says:

    thank you, ya I have been fairly comfortable translating C# into C++ in the past and like I said I really should learn both for the job I’m going for so maybe its just time I buckle down on C#, especially since, like you said, it’s not so different and therefore relatively easy to learn once a person’s already familiar with C+…Thanks Again!

  13. Gopinath says:

    .NET implements Quick sort… an unstable one. http://msdn.microsoft.com/en-us/library/w56d4y5z.aspx

  14. Gopinath says:

    Here is a more detailed source on what basis the framework chooses to implement a algorithm.. .NET uses insertion,heap sort and as well as Quick sort..
    http://msdn.microsoft.com/en-us/library/aw9s5t8f.aspx

  15. Stian says:

    Kristian, just add:

    Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");

    then the AA’s will be treated as you want 🙂

  16. Kristian says:

    I knew there had to be something like that 🙂

  17. Raynor says:

    Hello, i’ve been working on this problem on my own for a while, it gave me a different result, with almost the same code. I can not find why my number is ~1000 off from the right answer. Any suggestions? I can send you my code if you give me an email adress. I use the built in Array.Sort() method. (.net 4.5 VS2012 C#). My array is a list of strings. No doublecharacter issue (AA is treated as AA). Tried to check the sorting, it seems correctly sorted. My list got 5163 elements in it.

    Thanks for your help!

  18. Jean-Marie Hachey says:

    Table 1
    The total of all and five-name scores in the file of first names.
    [Re: Project Euler – Problem 22]

    http://img4.hostingpics.net/pics/890182pe22table1names.jpg

    The complete list contains 5 166 names.
    The five first: “MARY”, “PATRICIA”, “LINDA”, “BARBARA”, “ELIZABETH”.
    Execution times measured is the average of 3 consecutives determinations.

    P.S.
    Access to the list of 5 166 names:
    https://projecteuler.net/project/resources/p022_names.txt
    This link in the article is not available:
    http://projecteuler.net/project/names.txt

    ___

    Sources:
    1) PE22, Kristian’s algorithm
    http://www.mathblog.dk/files/euler/Problem22.cs
    2) Microsoft Visual C# 2010 Express
    3) Microsoft Office Excel (2007)
    4) Microsoft Office Word (2007)

  19. Mateus says:

    A very simple solution

    string[] names = File.ReadAllText(“names.txt”).Replace(“\””,””).Split(‘,’);

    Array.Sort (names);

    ulong breakp = Convert.ToUInt64(names.Length), sum = 0;

    for (ulong x = 0; x < breakp; x++)
    {
    ulong aux = 0;
    for (int y = 0; y < names [x].Length; y++)
    {
    aux += Convert.ToUInt64 (names [x] [y]) – 64;
    }
    aux *= (x + 1);
    sum += aux;
    }

    Console.WriteLine (sum);

    Don't forget to add references to System.IO;

Leave a Comment Here's Your Chance to Be Heard!

You can use these html tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

You can use short tags like: [code language="language"][/code]

The code tag supports the following languages: as3, bash, coldfusion, c, cpp, csharp, css, delphi, diff,erlang, groovy, javascript, java, javafx, perl, php, plain, powershell, python, ruby, scala, sql, tex, vb, xml

You can use [latex]your formula[/latex] if you want to format something using latex

This site uses cookies.