views:

316

answers:

8

So I'm really interested in Erlang. I can't find an excuse to use it for anything big, but I try to use it for toy problems from time to time.

Right now, I'm implementing a Roman Numeral translator. I'm just doing the "to" part for now, and I'm finding the code is awfully repetitive. It works like a charm, but, well, just look at it:

-module(roman).
-compile([export_all]).

toRoman(N) ->
    toRoman(N, []).

toRoman(0,Acc) ->
    lists:reverse(lists:flatten(Acc));

toRoman(N, Acc) when N >= 1000 ->
    toRoman(N-1000,["M"|Acc]);

toRoman(N,Acc) when N >= 900 ->
    toRoman(N-900,["CM" | Acc]);

toRoman(N,Acc) when N >= 500 ->
    toRoman(N-500,["D" | Acc]);

toRoman(N,Acc) when N >= 400 ->
    toRoman(N-400, ["CD" | Acc]);

toRoman(N,Acc) when N >= 100 ->
    toRoman(N-100, ["C" | Acc]);

toRoman(N,Acc) when N >= 90 ->
    toRoman(N-90, ["XC" | Acc]);

toRoman(N,Acc) when N >= 50 ->
    toRoman(N-50, ["L" | Acc]);

toRoman(N, Acc) when N >= 40 ->
    toRoman(N-40, ["XL" | Acc]);

toRoman(N, Acc) when N >= 10 ->
    toRoman(N-10, ["X" | Acc]);

toRoman(N, Acc) when N >= 9 ->
    toRoman(N-9, ["IX" | Acc]);

toRoman(N, Acc) when N >= 5 ->
    toRoman(N-5, ["V" | Acc]);

toRoman(N, Acc) when N >= 4 ->
    toRoman(N-4, ["IV" | Acc]);

toRoman(N, Acc) ->
    toRoman(N-1, ["I" | Acc]).

test() ->
    Test = fun(X) -> io:format("~p -> ~p~n", [X, toRoman(X)]) end,
    lists:map(Test, [0,1,3,6,23,43,75,87,13,23, 3999, 3998, 2531, 140]).

I feel like there's a much better way to do this. Can anyone provide some insight?

+6  A: 

Actually your code isn't that repetitive. It just looks like it because the text forms a repetitive pattern. But each of your clauses handles a specific case with little logical overlap between them. You could reimplement in a switch statement but you would get a similar repetition. There are just too many cases in a roman numeral translation and I don't think you will be able to avoid the repetitive feel making each of those individual decisions causes.

Jeremy Wall
I guess not, and for the sake of readability and maintainability this is probably how I would do it in the unlikely circumstance that I was writing production roman-numeral translation code. (Fixing the bugs that have been pointed out, of course.)
Adam Bard
A: 

Using a lookup table should make it shorter and faster in any language.

obecalp
Can you show how exactly you'd do that for this situation?
musicfreak
this isn't one-to-one mappings. He actually has to compare greater than or equal to. That's a good deal harder in a lookup table. I don't think it would be at all superior in this case.
Jeremy Wall
Unless he generates a lookup table with 4000 entries for all possible numbers :)
Zed
Are you sure? I'm not sure. Can you show us benchmark result for this example?
Hynek -Pichi- Vychodil
+1  A: 

This isn't repetitive, as the "logic" must be implemented anyway. One thing you could do is to make it non tail-recursive, as you won't have more than 20-30 recursions anyway...

-module(roman).
-compile([export_all]).

to_roman(N) when N >= 1000 -> "M"  ++ to_roman(N-1000);
to_roman(N) when N >=  900 -> "CM" ++ to_roman(N- 900);
...
to_roman(N) when N >=    4 -> "IV" ++ to_roman(N-   4);
to_roman(N) when N >=    1 -> "I"  ++ to_roman(N-   1);
to_roman(_) -> [].

You can further save some characters by defining a macro. I do hate macros, but you might like 'em :).

-module(roman).
-compile([export_all]).

-define( TO_ROMAN(L, C) , to_roman(N) when N >= L -> C ++ to_roman(N-L) ).

?TO_ROMAN(1000,  "M");
?TO_ROMAN( 900, "CM");
...
?TO_ROMAN(   4, "IV");
?TO_ROMAN(   1,  "I");
to_roman(_) -> [].
Zed
+4  A: 

If you want not be repetitive, you can inspire by mine Code Golf New Year Edition - Integer to Roman Numeral contribution.

-module(n2).
-export([y/1]).
-define(D(V,S),n(N)when N>=V->[??S|n(N-V)];).
y(N)->io:format(n(N)).
?D(1000,M)?D(900,CM)?D(500,D)?D(400,CD)?D(100,C)?D(90,XC)?D(50,L)?D(40,XL)?D(10,X)?D(9,IX)?D(5,V)?D(4,IV)?D(1,I)n(0)->[10].

It is not nice and recommended way to write code in erlang. Macros are bad. If you can, avoid it. It is hard to debug, it introduces intermodule dependencies which are not tracked by hot code swap, and so and so. If you like more functional like approach "code is data, data is code" look at this as example:

-module(roman).

-compile([export_all]).

toRoman(N) when is_integer(N), N >= 0 ->
    toRoman(N,
        [{1000, "M"}, {900, "CM"}, {500, "D"}, {400, "CD"},
         {100, "C"}, {90, "XC"}, {50, "L"}, {40, "XL"},
         {10, "X"}, {9, "IX"}, {5, "V"}, {4, "IV"}, {1, "I"}]).

toRoman(0, _) -> [];
toRoman(N, [{X, V} | _] = S) when N >= X ->
    [V | toRoman(N - X, S)];
toRoman(N, [_ | S]) -> toRoman(N, S).

test() ->
    F = fun (X) -> lists:flatten(toRoman(X)) end,
    "" = F(0),
    "I" = F(1),
    "III" = F(3),
    "VI" = F(6),
    "XXIII" = F(23),
    "XLIII" = F(43),
    "LXXV" = F(75),
    "LXXXVII" = F(87),
    "XIII" = F(13),
    "XXIII" = F(23),
    "MMMCMXCIX" = F(3999),
    "MMMCMXCVIII" = F(3998),
    "MMDXXXI" = F(2531),
    "CXL" = F(140),
    ok.

Just for curiosity, your code is about 5% faster in bytecode and 5% slower in native than mine. It performs one translation in 1.2us in bytecode and in 370ns in native on Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz.

Edit: I have not used tail recursive version because depth of recursion is very small. So I was curious if there is any performance penalty or gain from it. I can't measure any in mine algorithm in bytecode even native but interesting thing happen in original code. If I wrote original algorithm in straight forward way (not optimized for tail call) it is about 40% faster than mine in native code (one transformation in approx 250ns). There is not measurable difference in byte code. It is interesting example where tail call optimization is not worth to do.

toRoman(N) when N >= 1000 -> "M" ++ toRoman(N - 1000);
toRoman(N) when N >= 900 -> "CM" ++ toRoman(N - 900);
toRoman(N) when N >= 500 -> "D" ++ toRoman(N - 500);
toRoman(N) when N >= 400 -> "CD" ++ toRoman(N - 400);
toRoman(N) when N >= 100 -> "C" ++ toRoman(N - 100);
toRoman(N) when N >= 90 -> "XC" ++ toRoman(N - 90);
toRoman(N) when N >= 50 -> "L" ++ toRoman(N - 50);
toRoman(N) when N >= 40 -> "XL" ++ toRoman(N - 40);
toRoman(N) when N >= 10 -> "X" ++ toRoman(N - 10);
toRoman(N) when N >= 9 -> "IX" ++ toRoman(N - 9);
toRoman(N) when N >= 5 -> "V" ++ toRoman(N - 5);
toRoman(N) when N >= 4 -> "IV" ++ toRoman(N - 4);
toRoman(N) when N >= 1 -> "I" ++ toRoman(N - 1);
toRoman(0) -> [].

P.S.: Flattening lists is not common behavior for Erlang code. Return structure in above examples is well known as io_list and is usually accepted in erlang io system. You can send it directly to sockets, ports and so. If you want for example write it you can use io:put_chars(IOList) or io:format("Result: '~s'~n", [IOList]).

EDIT2: If there is constant list as left operand of ++ operator erlang compiler will optimize list concatenation for you thus ["string" | L] is not not necessary for speed. Resulting code is more readable and result is flattened without performance penalty. Personaly if I would be interested in performace I would use this version which is little bit repetitive but is the fastest what I know and performs one transformation in 310ns in byte code and in 210ns in native.

toRoman(N) when N >= 1000 -> "M" ++ toRoman(N - 1000);
toRoman(N) -> toRomanC(N div 100, N rem 100).

toRomanC(0, N) -> toRomanX(N);
toRomanC(1, N) -> "C" ++ toRomanX(N);
toRomanC(2, N) -> "CC" ++ toRomanX(N);
toRomanC(3, N) -> "CCC" ++ toRomanX(N);
toRomanC(4, N) -> "CD" ++ toRomanX(N);
toRomanC(5, N) -> "D" ++ toRomanX(N);
toRomanC(6, N) -> "DC" ++ toRomanX(N);
toRomanC(7, N) -> "DCC" ++ toRomanX(N);
toRomanC(8, N) -> "DCCC" ++ toRomanX(N);
toRomanC(9, N) -> "CM" ++ toRomanX(N).

toRomanX(N) -> toRomanX(N div 10, N rem 10).

toRomanX(0, N) -> toRomanI(N);
toRomanX(1, N) -> "X" ++ toRomanI(N);
toRomanX(2, N) -> "XX" ++ toRomanI(N);
toRomanX(3, N) -> "XXX" ++ toRomanI(N);
toRomanX(4, N) -> "XL" ++ toRomanI(N);
toRomanX(5, N) -> "L" ++ toRomanI(N);
toRomanX(6, N) -> "LX" ++ toRomanI(N);
toRomanX(7, N) -> "LXX" ++ toRomanI(N);
toRomanX(8, N) -> "LXXX" ++ toRomanI(N);
toRomanX(9, N) -> "XC" ++ toRomanI(N).

toRomanI(0) -> [];
toRomanI(1) -> "I";
toRomanI(2) -> "II";
toRomanI(3) -> "III";
toRomanI(4) -> "IV";
toRomanI(5) -> "V";
toRomanI(6) -> "VI";
toRomanI(7) -> "VII";
toRomanI(8) -> "VIII";
toRomanI(9) -> "IX".
Hynek -Pichi- Vychodil
Wow, this solution looks familiar... :)
Zed
+2  A: 

The repetitive part is the accumulation and the function call. Moving those to a single place and things will look much better.

%%% Roman numerals ruleset
r(N) when N >= 1000 -> {N-1000, "M"};
r(N) when N >= 900 -> {N-900, "CM"};
r(N) when N >= 500 -> {N-500, "D"};
r(N) when N >= 400 -> {N-400, "CD"};
r(N) when N >= 100 -> {N-100, "C"};
r(N) when N >= 90 -> {N-90, "XC"};
r(N) when N >= 50 -> {N-50, "L"};
r(N) when N >= 40 -> {N-40, "XL"};
r(N) when N >= 10 -> {N-10, "X"};
r(N) when N >= 9 -> {N-9, "IX"};
r(N) when N >= 5 -> {N-5, "V"};
r(N) when N >= 4 -> {N-4, "IV"};
r(N) when N > 0 -> {N-1, "I"}.

roman(N, Acc) ->
  case r(N) of
    {0, R} ->
      [R | Acc];
    {N2, R} ->
      roman(N2, [R | Acc])
  end.

roman(N) ->
  list_to_binary(lists:reverse(roman(N, ""))).

By the way, you get the same result for 4 and 6:

8> [roman:toRoman(N) || N <- lists:seq(1,10)].   
["I","II","III","VI","V","VI","VII","VIII","XI","X"]

The same bug gives you that 9 and 11 are equal, and 40 and 60, 90 and 110....

Christian
+2  A: 

There are three parts to this process, a list of rules for which symbols stand for which numerals, a search through these rules to find the the next symbol, and an iteration reducing the number to zero. Each part gets a function and we have:

ruleset() -> [
    {1000, "M"},
    {900, "CM"},
    {500, "D"},
    {400, "CD"},
    {100, "C"},
    {90, "XC"},
    {50, "L"},
    {40, "XL"},
    {10, "X"},
    {9, "IX"},
    {5, "V"},
    {4, "IV"},
    {1, "I"}].

find_next(N) -> find_next(ruleset(), N).

find_next([{V, Symbol}|_], N) when N >= V -> {N - V, Symbol};
find_next([_|T], N) -> find_next(T, N).

roman(N, Acc) ->
    case find_next(N) of
          {0, R}  -> [R | Acc];
          {N2, R} -> roman(N2, [R | Acc])
    end.

roman(N) ->
    lists:append(lists:reverse(roman(N, ""))).

You could probably use lists:foldl/3 to simplify this a bit further.

cthulahoops
Doh. I could of course have moved the subtraction part out of my rule set and just return by how much the number should be subtracted. Then I would have returned constants. I of course stand by my choice to use a function for the rules rather than a lookup table. :)
Christian
+1  A: 

It gets a bit funkier if you support all the variants of roman numerals like in Excel but basically your code remains a series of big case/pattern matching headers...

Gordon Guthrie
+5  A: 

I've added this one to rosettacode.org before, reposting here. I found the solution to be pretty elegant.

-module(roman).
-export([to_roman/1]).

to_roman(0) -> [];
to_roman(X) when X >= 1000 -> "M" ++ to_roman(X-1000);
to_roman(X) when X >= 100 -> digit(X div 100, $C,$D,$M) ++ to_roman(X rem 100);
to_roman(X) when X >= 10 -> digit(X div 10, $X,$L,$C) ++ to_roman(X rem 10);
to_roman(X) when X >= 1 -> digit(X, $I,$V,$X).

digit(1,X,_,_) -> [X];
digit(2,X,_,_) -> [X,X];
digit(3,X,_,_) -> [X,X,X];
digit(4,X,Y,_) -> [X,Y];
digit(5,_,Y,_) -> [Y];
digit(6,X,Y,_) -> [Y,X];
digit(7,X,Y,_) -> [Y,X,X];
digit(8,X,Y,_) -> [Y,X,X,X];
digit(9,X,_,Z) -> [X,Z].
I GIVE TERRIBLE ADVICE
Nice, It is really fast in bytecode (2x to mine) and comparable to native of straight forward approach (approx 15% slower).
Hynek -Pichi- Vychodil
You could even make it a little faster by including the tail as an extra argument to digit/4 and thereby skipping the append. It is clearer as it is now I think.
rvirding
@rvinrding: It seems that it doesn't make measurable improvement in R13B02. It looks like that compiler does it for you magically.
Hynek -Pichi- Vychodil
I don't think tail recursion will make it that much faster (assuming we rarely calculate billions in roman numerals, which can be a pretty big assumption)
I GIVE TERRIBLE ADVICE