Project Euler 98: Investigating words, and their anagrams, which can represent square numbers.

Project Euler 98: Investigating words, and their anagrams, which can represent square numbers.

Written by on 26 May 2012

Topics: Project Euler

Problem 98 of Project Euler reads

By replacing each of the letters in the word CARE with 1, 2, 9, and 6 respectively, we form a square number: 1296 = 362. What is remarkable is that, by using the same digital substitutions, the anagram, RACE, also forms a square number: 9216 = 962. We shall call CARE (and RACE) a square anagram word pair and specify further that leading zeroes are not permitted, neither may a different letter have the same digital value as another letter.

Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, find all the square anagram word pairs (a palindromic word is NOT considered to be an anagram of itself).

What is the largest square number formed by any member of such a pair?

NOTE: All anagrams formed must be contained in the given text file.

I haven’t been able to find anything but a brute force solution which consists of two steps. First step is to identify anagram word pairs in the file. This is done simply by sorting the letters in each word and then look for two words that are equal. The code for this part looks like

string[] words = File.ReadAllText(filename).
                 Replace("\"", "").Split(',');
char[][] sorted = new char[words.Length][];

for (int i = 0; i < words.Length; i++) {
    sorted[i] = words[i].ToCharArray();
    Array.Sort(sorted[i]);
}

for (int i = 0; i < words.Length; i++) {
    for (int j = i + 1; j < words.Length; j++) {

    if (sorted[i].Length != sorted[j].Length) continue;
        bool isEqual = true;
        for (int k = 0; k < sorted[i].Length; k++) {
            isEqual = sorted[i][k] == sorted[j][k];
            if (!isEqual) break;
    }

    if (isEqual) {
        //Found an anagram wordpair
        //Check if they are both square
    }
}

Yep, it is pure bruteforce, but I haven’t been able to find a better method for doing this.

Once we find an anagram word pair we need to check if both of them are squares.

Checking if the anagram pair is square

The first thing I need in order to do this is a list of all squares with the same amount of digits as there are letters in the words. I have cheated a bit and knows that the longest anagram there is has 8 characters, so if we square all numbers up to 31700 we should be okay.

The for every number with the right amount of digits, we can build a dictionary where we map letters and digits to each other in the right order. In the process we need to check for two things.

  1. If the character already exists in the word, then it must have the same digit value
  2. If it passes point 1. we need to check if the digit we are trying to map already exists. In that case they do not match.

Once we have made this map, we can generate a number out of the second word using the dictionary. The only thing we need to ensure is that the first digit is a nonzero digit. Once we have this number we can check if it is square.  This can be done either by checking if the square root is an integer or by checking if the number exist in the square list. The two methods are about the same speed.

Once we have looped through all squares we return the largest value that matches for that pair.

private int SquareAnagram(string word1, string word2) {
    int max = 0;
    char[] w1array = word1.ToCharArray();
    char[] w2array = word2.ToCharArray();

    for (int i = 0; i < squares.Length; i++) {
        int squareLength = squares[i].ToString().Length;

        //Too short, keep looking
        if (squareLength < word1.Length)
            continue;

        //Too int, stop search
        if (squareLength > word1.Length)
            break;

        bool match = true;

        int square = squares[i];
        Dictionary<char, int> map = new Dictionary<char, int>();

        //Make a map out of the first word
        for (int j = w1array.Length-1; j >= 0 ; j--) {
            int digit = square % 10;
            square /= 10;

            //A repeated letter is found which
            //doesn't match the square pattern
            if (map.ContainsKey(w1array[j])) {
                if(map[w1array[j]] == digit){
                    continue;
                } else {
                    match = false;
                    break;
                }
            }

            //The value is already used
            if(map.ContainsValue(digit)){
                match = false;
                break;
            }

            map.Add(w1array[j], digit);
        }

        if (!match) continue;

        //Check if the map can be used for word 2
        int w2value = 0;
        if (map[w2array[0]] == 0) {
            match = false;
        } else {
            for (int j = 0; j < w2array.Length; j++) {
                w2value = w2value * 10 + map[w2array[j]];
            }
        }

        if (!match) continue;
        if (Array.BinarySearch(squares, w2value) > -1)  {
            int maxpair = Math.Max(w2value, squares[i]);
            max = Math.Max(max, maxpair);
        }
    }
    return max;
}

Executing the code

We still need a bit of checking to find the largest number among all the anagram square pairs, but you can take a look at the uploaded source code to see that. However, once that is in place we can execute the program with the result

The largest square number is 18769
Solution took 49,3792 ms

Not too shabby I think. However, we can improve the anagram finding a bit.

Hashing the words

After seeing my solution Bjarki proposed that instead of the O(n2) complexity of the anagram finding I proposed initially I could use hashing to find anagrams by making a dictionary where the keys are the words rearranged such that the letters are sorted. Then all words with the same letters will be thrown into a bin with that hash and thus we have all the anagrams. For example: CARE will be hashed to ACER, and RACE will also be hashed to ACER.

Once we have all the hashes we just need to look into each bin and do the square matching if there are multiple words in the bin.

The code for this looks like

Dictionary<string, List<string>> anagrams = new Dictionary<string, List<string>>();
string[] words = File.ReadAllText(filename).Replace("\"", "").Split(',');
foreach (string name in words) {
    string key = new String(name.ToCharArray().OrderBy(i => i).ToArray());

    if (!anagrams.ContainsKey(key)) {
        anagrams.Add(key, new List<string>());
    }

    anagrams[key].Add(name);
}

foreach (KeyValuePair<string, List<string>> anagram in anagrams){
    if (anagram.Value.Count <= 1) continue;

    for (int i = 0; i < anagram.Value.Count; i++) {
        for (int j = i + 1; j < anagram.Value.Count; j++) {
            int pairvalue = SquareAnagram(anagram.Value[i], anagram.Value[j]);
            if (pairvalue > result)
                result = pairvalue;
        }
    }
}

This executes in 39ms, which means 20% faster.

Wrapping up

We have now solved this problem first bruteforce searching for anagrams and then by making a hashtable for the words. We also managed to get the execution time down to something reasonable.

The code as always is available here.

How did you do it?

6 Comments For This Post I'd Love to Hear Yours!

  1. Kal says:

    The longest anagram is actually 9 characters:

    INTRODUCE, REDUCTION

    Luckily, it is the only one, so I imagine it’s just a matter of showing that it doesn’t form an anagram square. Haven’t finished this problem yet, though.

  2. Jean-Marie Hachey says:

    Hi Kristian,

    I have a problem in saving the « input.txt » for PE98.
    (It seems that an I/O operation failed unrecoverably…)

    Here is the procedure I follow to create a file text:

    1) Right click on …
    Using words.txt (right click and ‘Save Link/Target As…’),
    2) Save it on my computer desktop.
    (The file appears on my desktop under the name: « words ».
    3) Run your algo for PE98 :
    http://www.mathblog.dk/files/euler/Problem098.cs
    4) Output received from Microsoft Visual C# 2010 Express:
    FileNotFoundException was unhandled
    Troubleshooting tips:
    Verify that the file exists in the specified location.

    ___

    Do you have any suggestions to improve my procedure ?
    Always a pleasure enjoying your exceptional blog.

    Thanking you in advance,

    Jean-Marie

  3. Kristian says:

    Hi Jean-Marie

    In my solution the file is called input.txt, and it seems yours is called words. I think that might be the problem.

  4. Jean-Marie Hachey says:

    Hi Kristian,

    Thank you for your answer.

    I should have specified in my question that I change the name of the file from « words » to « input.txt ».

    I always received the same output from Microsoft Visual C# 2010 Express.

    ___

    I had the same output with the following problems using System.IO:

    Project Euler 18: Maximum Sum from Top to Bottom of the Triangle
    http://www.mathblog.dk/project-euler-18/

    Project Euler 102: For how many triangles in the text file does the interior contain the origin?
    http://www.mathblog.dk/project-euler-102-triangles-contain-origin/

    Thanks again :-)

  5. Kristian says:

    You could try printing out the path to the console, to see where the program thinks the file should be.

    so somewhere after the filename variable has been declared use
    Console.WriteLine(filename);

  6. Jean-Marie Hachey says:

    As pointed out in the article:
    The largest square number is 18769.

    Searching for the digital anagram to be paired with 18769…

    Table 1
    Correspondence of digital and alphabetical anagrams.
    Re: Project Euler, Problem 98
    Investigating words, and their anagrams, which can represent square numbers.

    http://img11.hostingpics.net/pics/921619PE98table1.jpg

    Step#1:
    Calculations :
    1+8+7+6+9 =31
    1x8x7x6x9 = 3024
    (18769)^(0.5) = 137

    Step#2:
    List of 5-different-digit squares (digits > 0) between 100 (100^2=10000) and 316 (316^2=99856); (317)^2=100489].
    A portion of this list showing values with sum=31 and product=3024 is presented in Table 1. We also see four digital anagrams:
    17689, 18769, 78961, 81796.
    The algo identifies one of the digital anagram (18769): what is the second one to be compared with the 5-letter anagrams contained within the list of nearly 2 000 words ?

    Step#3:
    Extracting 5-different-letter anagrams from the big list.
    This is done easily with word processing.
    So, we are looking for a pair of alphabetical anagrams with the same sequence as the one observed between 18769 and 17689 (17689 beeing our first candidate in pair searching of anagrams).

    Excerpt from the big list : Table 1 gives the letter anagram pair corresponding to the digital-anagram pair, both anagrams having the same sequence of characters.

    In comparison to our 2 first 5-digit candidates:
    18769
    and
    17689,
    the list gives:
    BROAD
    and
    BOARD
    Effectively, BROAD and BOARD is the only anagram pair found in the list.
    And …
    BOARD=133^2
    BROAD=137^2

    ___

    Sources:
    1) Project Euler 98
    Kristian’s algorithm
    http://www.mathblog.dk/files/euler/Problem98.cs
    2) Microsoft Visual C# 2010 Express
    3) Microsoft Office Excel (2007)
    4) Microsoft Office Word (2007)

Leave a Comment Here's Your Chance to Be Heard!

You can use these html tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

You can use short tags like: [code language="language"][/code]

The code tag supports the following languages: as3, bash, coldfusion, c, cpp, csharp, css, delphi, diff,erlang, groovy, javascript, java, javafx, perl, php, plain, powershell, python, ruby, scala, sql, tex, vb, xml

You can use [latex]your formula[/latex] if you want to format something using latex

Notify me of comments via e-mail. You can also subscribe without commenting.