Project Euler 89: Develop a method to express Roman numerals in minimal form.

Project Euler 89: Develop a method to express Roman numerals in minimal form.

Written by on 24 March 2012

Topics: Project Euler

I feared that the problems would become harder and harder, and after solving problem 88 I feared Problem 89 of Project Euler. Luckily for me I was wrong, problem 89 is not that difficult. The problem reads

The rules for writing Roman numerals allow for many ways of writing each number (see About Roman Numerals…). However, there is always a “best” way of writing a particular number.

For example, the following represent all of the legitimate ways of writing the number sixteen:

IIIIIIIIIIIIIIII
VIIIIIIIIIII
VVIIIIII
XIIIIII
VVVI
XVI

The last example being considered the most efficient, as it uses the least number of numerals.

The 11K text file, roman.txt  contains one thousand numbers written in valid, but not necessarily minimal, Roman numerals; that is, they are arranged in descending units and obey the subtractive pair rule (see About Roman Numerals… for the definitive rules for this problem).

Find the number of characters saved by writing each of these in their minimal form.

Note: You can assume that all the Roman numerals in the file contain no more than four consecutive identical units.

My first thoughts regarding this problem was to make a converter for converting romans to arabic numbers and then a converter the other way as well. However after analyzing the problem a bit more I realized that the number of replacements that we can make is limited since we know that there are no more than 4 consecutive digits and we know that they are ordered unless they are already using the subtractive rule.

What we need to look for is where we can apply the subtractive rule our self and thereby shorten the numeral representation even further. There are three rules according to the explaining text.

  1. We can use I in front of V and X, meaning that we can convert IIII into IV and VIIII into IX
  2. We can use X in front of L and C, meaning that we can convert XXXX into XL and VXXX into XC
  3. We can use C in front of D and M, meaning that we can convert CCCC into CD and DCCCC into CM

That gives us a grand total of 6 substitution patterns to look for which all needs to be substituted with a 2 character replacement. So for counting purposes we don’t even need to make the correct substitutions, we just need to substitute all 6 patterns with a 2 character replacement and thus get the correct minimal string length.

What better method to do this than use one of my favorite tools, regular expressions? I love working with it, as it is a really powerful tool. But they are difficult to write correctly I think. However, for this problem they are just brilliant and this is why I chose to use them. For the .NET platform there is a guide for using regular expressions at MSDN library. If you want a more thorough walk through of Regular Expressions I can recommend Mastering Regular Expressions by Jeffrey Friedl.

Our pattern today is very simple though. So lets get right to the C# code for the problem

string roman = File.ReadAllText(filename);
string pattern = "DCCCC|LXXXX|VIIII|CCCC|XXXX|IIII";
string replacement = "kk";
int result = roman.Length - Regex.Replace(roman, pattern, replacement).Length;

There you go, four lines of code and the problem is solved. I believe I could compress it to one line, but that would be ugly. This is assuming that the filename is parsed as a parameter.

Running this code gives us the following result

743 characters shorter
Solution took 4,8019 ms

So it doesn’t even take that long to do the substitutions.

Wrapping up

Talking about romans always makes me think of Life Of Brian, and especially when we are talking substitutions the Pontius Pilate has a slight speech impedient resulting in sentences like “I’ve had enough of this wowdy webel sniggewing behaviour. Silence! Call yourselves Pwaetowian guards? You’re not – Seize him! Seize him! Blow your noses and seize him! “. An absolute must see movie in my opnion.

Anyway, back to the problem which was solved rather easily. You can find the source code here.

The beautiful image of roman numerals on a building was taken by Craig Hamnett who kindly shared it under the creative commons license which allows me to reuse it.

That was all I had to say about the solution for this little problem.

6 Comments For This Post I'd Love to Hear Yours!

  1. john says:

    Nice solution. I did go for your first though to use two-way converters. In a first step, I convert MMDCCCLXXXIII to 2883. And the second step converts this into an optimal roman, i.e. MMCCMXXCIII, saving 2 characters. But then I found out that the example I give here is not allowed. Is the substraction rule only allowed to be applied once? Or can there be at most one substraction character only? I did not find the document explaining Roman numerals very clear (but I’m also not a native speaker). Anyways, it was nice to read your elegant solution again, thanks, John.

  2. Kristian says:

    Yes in general the subtraction rule can only be applied once.

  3. George says:

    I used a more brute force method; clumsier, but took only 3ms.

    I’m currently learning C# at age 76 (trying to avoid Alzheimers), although I have used BASIC, Z80 assembler and Borland C,

    I love your solutions, but only look at them when I’m stuck.

    Thanks for your Posts.

  4. Kristian says:

    You are the living proof that you will never be to old to learn new things. I have programmed in Basic and Borland C as well (not that I remember it any more) but not Z80 assembler.

    Thanks for the comment and good luck on the problem solving. It does really keep you going doesn’t it.

  5. Bob says:

    This is very clever! Nice job as usual

  6. pbl says:

    Ok, this is the first time I look on the web out there for a solution, because I was very much convinced that my answer (740) was right, and pissed that it does not work. But after examining where your answer differs from mine for length, I found out an embarrassing mistake: the way I loaded the file ‘roman.txt’, I just killed the last character of every line, to get rid of the newline. Yeaahh, and that just corrupted the last numeral… which is exactly where your method and mine were in disagreement. Duh.

Leave a Comment Here's Your Chance to Be Heard!

You can use these html tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

You can use short tags like: [code language="language"][/code]

The code tag supports the following languages: as3, bash, coldfusion, c, cpp, csharp, css, delphi, diff,erlang, groovy, javascript, java, javafx, perl, php, plain, powershell, python, ruby, scala, sql, tex, vb, xml

You can use [latex]your formula[/latex] if you want to format something using latex

Notify me of comments via e-mail. You can also subscribe without commenting.