I feared that the problems would become harder and harder, and after solving problem 88 I feared Problem 89 of Project Euler. Luckily for me I was wrong, problem 89 is not that difficult. The problem reads

The rules for writing Roman numerals allow for many ways of writing each number (see About Roman Numerals…). However, there is always a “best” way of writing a particular number.

For example, the following represent all of the legitimate ways of writing the number sixteen:

IIIIIIIIIIIIIIII

VIIIIIIIIIII

VVIIIIII

XIIIIII

VVVI

XVI

The last example being considered the most efficient, as it uses the least number of numerals.

The 11K text file, roman.txt contains one thousand numbers written in valid, but not necessarily minimal, Roman numerals; that is, they are arranged in descending units and obey the subtractive pair rule (see About Roman Numerals… for the definitive rules for this problem).

Find the number of characters saved by writing each of these in their minimal form.

Note: You can assume that all the Roman numerals in the file contain no more than four consecutive identical units.

My first thoughts regarding this problem was to make a converter for converting romans to arabic numbers and then a converter the other way as well. However after analyzing the problem a bit more I realized that the number of replacements that we can make is limited since we know that there are no more than 4 consecutive digits and we know that they are ordered unless they are already using the subtractive rule.

What we need to look for is where we can apply the subtractive rule our self and thereby shorten the numeral representation even further. There are three rules according to the explaining text.

- We can use I in front of V and X, meaning that we can convert IIII into IV and VIIII into IX
- We can use X in front of L and C, meaning that we can convert XXXX into XL and VXXX into XC
- We can use C in front of D and M, meaning that we can convert CCCC into CD and DCCCC into CM

That gives us a grand total of 6 substitution patterns to look for which all needs to be substituted with a 2 character replacement. So for counting purposes we don’t even need to make the correct substitutions, we just need to substitute all 6 patterns with a 2 character replacement and thus get the correct minimal string length.

What better method to do this than use one of my favorite tools, regular expressions? I love working with it, as it is a really powerful tool. But they are difficult to write correctly I think. However, for this problem they are just brilliant and this is why I chose to use them. For the .NET platform there is a guide for using regular expressions at MSDN library. If you want a more thorough walk through of Regular Expressions I can recommend Mastering Regular Expressions by Jeffrey Friedl.

Our pattern today is very simple though. So lets get right to the C# code for the problem

string roman = File.ReadAllText(filename); string pattern = "DCCCC|LXXXX|VIIII|CCCC|XXXX|IIII"; string replacement = "kk"; int result = roman.Length - Regex.Replace(roman, pattern, replacement).Length;

There you go, four lines of code and the problem is solved. I believe I could compress it to one line, but that would be ugly. This is assuming that the filename is parsed as a parameter.

Running this code gives us the following result

743 characters shorter Solution took 4,8019 ms

So it doesn’t even take that long to do the substitutions.

## Wrapping up

Talking about romans always makes me think of Life Of Brian, and especially when we are talking substitutions the Pontius Pilate has a slight speech impedient resulting in sentences like *“I’ve had enough of this wowdy webel sniggewing behaviour. Silence! Call yourselves Pwaetowian guards? You’re not – Seize him! Seize him! Blow your noses and seize him! “. *An absolute must see movie in my opnion.

Anyway, back to the problem which was solved rather easily. You can find the source code here.

The beautiful image of roman numerals on a building was taken by Craig Hamnett who kindly shared it under the creative commons license which allows me to reuse it.

That was all I had to say about the solution for this little problem.

Nice solution. I did go for your first though to use two-way converters. In a first step, I convert MMDCCCLXXXIII to 2883. And the second step converts this into an optimal roman, i.e. MMCCMXXCIII, saving 2 characters. But then I found out that the example I give here is not allowed. Is the substraction rule only allowed to be applied once? Or can there be at most one substraction character only? I did not find the document explaining Roman numerals very clear (but I’m also not a native speaker). Anyways, it was nice to read your elegant solution again, thanks, John.

Yes in general the subtraction rule can only be applied once.

I used a more brute force method; clumsier, but took only 3ms.

I’m currently learning C# at age 76 (trying to avoid Alzheimers), although I have used BASIC, Z80 assembler and Borland C,

I love your solutions, but only look at them when I’m stuck.

Thanks for your Posts.

You are the living proof that you will never be to old to learn new things. I have programmed in Basic and Borland C as well (not that I remember it any more) but not Z80 assembler.

Thanks for the comment and good luck on the problem solving. It does really keep you going doesn’t it.

This is very clever! Nice job as usual

Ok, this is the first time I look on the web out there for a solution, because I was very much convinced that my answer (740) was right, and pissed that it does not work. But after examining where your answer differs from mine for length, I found out an embarrassing mistake: the way I loaded the file ‘roman.txt’, I just killed the last character of every line, to get rid of the newline. Yeaahh, and that just corrupted the last numeral… which is exactly where your method and mine were in disagreement. Duh.

3487 should be MMMCDLXXXVII (according to most people and you script), but MMMLDXXXVII is one less and it is not in violation with the rules? – so which is right and why? because, using this notation can save above 900 chars instead of only 743.

According to wikipedia you can use the following subtractive patterns

– the numeral I can be placed before V and X to make 4 units (IV) and 9 units (IX) respectively

– X can be placed before L and C to make 40 (XL) and 90 (XC) respectively

– C can be placed before D and M to make 400 (CD) and 900 (CM) according to the same pattern

You could argue that you can do it for any combination, but that is just not how the system works. No the roman numerals are in fact not the most logic based counting system there is.

I am literally in the same situation as pbl above me is. Don’t know what I would have done without you. Thank you, pbl, for preventing a mental breakdown.

I think the statement “The subtraction rule can only be applied once” is a little ambiguous.

For instance, 1999 can be represented as MCMXCIX, where you can see the subtraction rule applied 3 times. A better statement is “only one subtraction numeral can be applied for each use of the subtraction rule”. Or something like that. SO you can’t use “CCM” to represent 1800. You have to use MDCCC.

Anyway, my C++ code doing full 2-way conversion only took 1.86 ms (1,86 using European notation)