tags:

views:

1062

answers:

13

I'm in the process of learning Erlang. As an exercise I picked up the Sieve of Eratosthenes algorithm of generating prime numbers. Here is my code:

-module(seed2).
-export([get/1]).

get(N) -> WorkList = lists:duplicate(N, empty),
          get(2, N, WorkList, []).

get(thats_the_end, _N, _WorkList, ResultList) -> lists:reverse(ResultList);
get(CurrentPrime, N, WorkList, ResultList) -> ModWorkList = markAsPrime(CurrentPrime, N, WorkList),
                                              NextPrime = findNextPrime(CurrentPrime + 1, N, WorkList),
                                              get(NextPrime, N, ModWorkList, [CurrentPrime|ResultList]).


markAsPrime(CurrentPrime, N, WorkList) when CurrentPrime =< N -> WorkListMod = replace(CurrentPrime, WorkList, prime),
                                                                 markAllMultiples(CurrentPrime, N, 2*CurrentPrime, WorkListMod).

markAllMultiples(_ThePrime, N, TheCurentMark, WorkList) when TheCurentMark > N -> WorkList;
markAllMultiples(ThePrime, N, TheCurrentMark, WorkList) -> WorkListMod = replace(TheCurrentMark, WorkList, marked),
                                                           markAllMultiples(ThePrime, N, TheCurrentMark + ThePrime, WorkListMod).

findNextPrime(Iterator, N, _WorkList) when Iterator > N -> thats_the_end;
findNextPrime(Iterator, N, WorkList) -> I = lists:nth(Iterator, WorkList),
                                        if
                                          I =:= empty -> Iterator;
                                          true -> findNextPrime(Iterator + 1, N, WorkList)
                                        end.

replace(N, L, New)-> {L1, [_H|L2]} = lists:split(N - 1, L),
                     lists:append(L1, [New|L2]).

This code actually works :) . The problem is that I have this feeling that it is not the best possible implementation.

My question is what would be the "erlangish" way of implementing the "Sieve of Eratosthenes"

EDIT: OK, Andreas solution is very good but it is slow. Any ideas how to improve that?

+3  A: 

I approached the problem by using concurrent processing.

Source

EvilTeach
Thanks for the quick response! I'll defiantly read your post. My question is actually not related to concurrency, just plain old sequential programming. The code end up to be too long compared to a C implementation so maybe there is a better way?
Roskoto
imho, your code is quite reasonable.The intent of your question seemed to be on the order of, how can I make this more erlangish. Concurrency is the place where the language is strong. That is the reason i pointed you to mine.
EvilTeach
Getting concurrency working with the sieve is a bit of a problem, as data is not shared between subprocesses. That complicates things. This is one of the reasons I didn't use that algorithm.You should also take a look at http://en.wikipedia.org/wiki/Sieve_of_Atkinfor a different spin on things.
EvilTeach
EvilTeach, I voted up your answer because it is a good one. I still can't accept is as the final answer though. If my code is a reasonable one I'm quite disappointed from the functional programming in general. This easy algorithm can be implemented so much more easily in C.
Roskoto
Ok. I don't have a problem with that. Here is the real question then. How could concurrency be used to make the sieve work?
EvilTeach
+6  A: 

Here's a simple (but not terribly fast) sieve implementation:

-module(primes).
-export([sieve/1]).
-include_lib("eunit/include/eunit.hrl").

sieve([]) ->
    [];
sieve([H|T]) ->    
    List = lists:filter(fun(N) -> N rem H /= 0 end, T),
    [H|sieve(List)];
sieve(N) ->
    sieve(lists:seq(2,N)).
Andreas
Wow, That's exacty what I've looked for. It is way simpler and faster then mine solution. Thanks !!! Only one question what is the -include_lib("eunit/include/eunit.hrl").line about?
Roskoto
Oh, that's a piece of boilerplate that followed along. It's for using EUnit, the Erlang Unit testing library. Glad you found the code helpful :)
Andreas
EDIT: OK, Andreas solution is very good but it is slow. Any ideas how to improve that?
Roskoto
Roskoto, see my answer. This answer's code, beautiful as it is, is not the Sieve of Eratosthenes.
Darius Bacon
A: 

my fastest code so far (faster then Andrea's) is with using array:

-module(seed4).
-export([get/1]).

get(N) -> WorkList = array:new([{size, N}, {default, empty}]),
          get(2, N, WorkList, []).

get(thats_the_end, _N, _WorkList, ResultList) -> lists:reverse(ResultList);
get(CurrentPrime, N, WorkList, ResultList) -> ModWorkList = markAsPrime(CurrentPrime, N, WorkList),
                                              NextPrime = findNextPrime(CurrentPrime + 1, N, WorkList),
                                              get(NextPrime, N, ModWorkList, [CurrentPrime|ResultList]).


markAsPrime(CurrentPrime, N, WorkList) when CurrentPrime =< N -> WorkListMod = replace(CurrentPrime, WorkList, prime),
                                                                 markAllMultiples(CurrentPrime, N, 2*CurrentPrime, WorkListMod).

markAllMultiples(_ThePrime, N, TheCurentMark, WorkList) when TheCurentMark > N -> WorkList;
markAllMultiples(ThePrime, N, TheCurrentMark, WorkList) -> WorkListMod = replace(TheCurrentMark, WorkList, marked),
                                                           markAllMultiples(ThePrime, N, TheCurrentMark + ThePrime, WorkListMod).

findNextPrime(Iterator, N, _WorkList) when Iterator > N -> thats_the_end;
findNextPrime(Iterator, N, WorkList) -> I = array:get(Iterator - 1, WorkList),
                                        if
                                          I =:= empty -> Iterator;
                                          true -> findNextPrime(Iterator + 1, N, WorkList)
                                        end.

replace(N, L, New) -> array:set(N - 1, New, L).
Roskoto
A: 

Here's an even more Erlangish way of doing it. It's very similar to Andreas' solution above, but uses a "List Comprehension" rather than a function call to do the sieving:

 primes(N) -> sieve([], lists:seq(2, N)).
 sieve(P, [H|T]) -> sieve(P ++ [H], [X || X <- T, X rem H /= 0]);
 sieve(P, []) -> P.
Alnitak
It still uses the 'rem' thing which is extremely slow. Besides it is not exactly the Eratosthenes algorithm
Roskoto
A: 

I haven't studied these in detail, but I've tested my implementation below (that I wrote for a Project Euler challenge) and it's orders of magnitude faster than the above two implementations. It was excruciatingly slow until I eliminated some custom functions and instead looked for lists: functions that would do the same. It's good to learn the lesson to always see if there's a library implementation of something you need to do - it'll usually be faster! This calculates the sum of primes up to 2 million in 3.6 seconds on a 2.8GHz iMac...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Sum of all primes below Max. Will use sieve of Eratosthenes 
sum_primes(Max) ->
    LastCheck = round(math:sqrt(Max)),
    All = lists:seq(3, Max, 2), %note are creating odd-only array
    Primes = sieve(All, Max, LastCheck),
    %io:format("Primes: ~p~n", [Primes]),
    lists:sum(Primes) + 2. %adding back the number 2 to the list

%sieve of Eratosthenes
sieve(All, Max, LastCheck) ->
    sieve([], All, Max, LastCheck).

sieve(Primes, All, Max, LastCheck) ->
    %swap the first element of All onto Primes 
    [Cur|All2] = All,
    Primes2 = [Cur|Primes],
    case Cur > LastCheck of 
     true ->
      lists:append(Primes2, All2); %all known primes and all remaining from list (not sieved) are prime
     false -> 
      All3 = lists:filter(fun(X) -> X rem Cur =/= 0 end, All2),
      sieve(Primes2, All3, Max, LastCheck)

 end.
Correction: In testing on my machine, my implementation was orders of magnitude faster than Alnitak's but less than 3X faster than Roskoto's. The differences between all of these only become appreciable at numbers over 100K or so.
Critiquing my own code, I mixed sieve functions into sum. A cleaner sieve would start: sieve(Max) -> All = lists:seq(3, Max, 2), LastCheck = round(math:sqrt(Max)), sieve([], All, Max, LastCheck).
Hi BarryE, Thank you for your interest to this question and the solution. Looking in to your code I find the Andreas idea with lists:filter(rem). Keep in mind that rem is extremely slow operration, besides is in not part of the original algorithm witch uses summing.
Roskoto
What makes your code faster is the LastCheck = round(math:sqrt(Max)) thing which if used properly in the array solution will make it the fastest solution.
Roskoto
A: 

Hi, I kind of like this subject, primes that is, so I started to modify BarryE's code a bit and I manged to make it about 70% faster by making my own lists_filter function and made it possible to utilize both of my CPUs. I also made it easy to swap between to two version. A test run shows:

61> timer:tc(test,sum_primes,[2000000]).
{2458537,142913970581}

Code:



-module(test).

%%-export([sum_primes/1]).
-compile(export_all).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%Sum of all primes below Max. Will use sieve of Eratosthenes 
sum_primes(Max) ->
    LastCheck = round(math:sqrt(Max)),
    All = lists:seq(3, Max, 2), %note are creating odd-only array
    %%Primes = sieve(noref,All, LastCheck),
    Primes = spawn_sieve(All, LastCheck),
    lists:sum(Primes) + 2. %adding back the number 2 to the list


%%sieve of Eratosthenes
sieve(Ref,All, LastCheck) ->
    sieve(Ref,[], All, LastCheck).

sieve(noref,Primes, All = [Cur|_], LastCheck) when Cur > LastCheck ->
    lists:reverse(Primes, All); %all known primes and all remaining from list (not sieved) are prime    
sieve({Pid,Ref},Primes, All=[Cur|_], LastCheck) when Cur > LastCheck ->
    Pid ! {Ref,lists:reverse(Primes, All)}; 
sieve(Ref,Primes, [Cur|All2], LastCheck) ->
    %%All3 = lists:filter(fun(X) -> X rem Cur =/= 0 end, All2),
    All3 = lists_filter(Cur,All2),
    sieve(Ref,[Cur|Primes], All3,  LastCheck).


lists_filter(Cur,All2) ->
    lists_filter(Cur,All2,[]).

lists_filter(V,[H|T],L) ->
    case H rem V of
    0 ->
        lists_filter(V,T,L);
    _ ->
        lists_filter(V,T,[H|L])
    end;
lists_filter(_,[],L) ->
    lists:reverse(L).



%% This is a sloppy implementation ;)
spawn_sieve(All,Last) ->
    %% split the job
    {L1,L2} = lists:split(round(length(All)/2),All),
    Filters = filters(All,Last),
    %%io:format("F:~p~n",[Filters]),
    L3 = lists:append(Filters,L2),
    %%io:format("L1:~w~n",[L1]),
    %%    io:format("L2:~w~n",[L3]),
    %%lists_filter(Cur,All2,[]).
    Pid = self(),
    Ref1=make_ref(),
    Ref2=make_ref(),
    erlang:spawn(?MODULE,sieve,[{Pid,Ref1},L1,Last]),
    erlang:spawn(?MODULE,sieve,[{Pid,Ref2},L3,Last]),
    Res1=receive
         {Ref1,R1} ->
      {1,R1};
         {Ref2,R1} ->
      {2,R1}
     end,
    Res2= receive
          {Ref1,R2} ->
       {1,R2};
          {Ref2,R2} ->
       {2,R2}
      end,
    apnd(Filters,Res1,Res2).


filters([H|T],Last) when H 
    [H|filters(T,Last)];
filters([H|_],_) ->
    [H];
filters(_,_) ->
    [].


apnd(Filters,{1,N1},{2,N2}) ->
    lists:append(N1,subtract(N2,Filters));
apnd(Filters,{2,N2},{1,N1}) ->
    lists:append(N1,subtract(N2,Filters)).



subtract([H|L],[H|T]) ->
    subtract(L,T);
subtract(L=[A|_],[B|_]) when A > B ->
    L;
subtract(L,[_|T]) ->
    subtract(L,T);
subtract(L,[]) ->
    L.
+1  A: 

My previous post did not get formatted correctly. Here is a repost of the code. Sorry for spamming...


-module(test).

%%-export([sum_primes/1]).
-compile(export_all).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%Sum of all primes below Max. Will use sieve of Eratosthenes 
sum_primes(Max) ->
    LastCheck = round(math:sqrt(Max)),
    All = lists:seq(3, Max, 2), %note are creating odd-only array
    %%Primes = sieve(noref,All, LastCheck),
    Primes = spawn_sieve(All, LastCheck),
    lists:sum(Primes) + 2. %adding back the number 2 to the list


%%sieve of Eratosthenes
sieve(Ref,All, LastCheck) ->
    sieve(Ref,[], All, LastCheck).

sieve(noref,Primes, All = [Cur|_], LastCheck) when Cur > LastCheck ->
    lists:reverse(Primes, All); %all known primes and all remaining from list (not sieved) are prime    
sieve({Pid,Ref},Primes, All=[Cur|_], LastCheck) when Cur > LastCheck ->
    Pid ! {Ref,lists:reverse(Primes, All)}; 
sieve(Ref,Primes, [Cur|All2], LastCheck) ->
    %%All3 = lists:filter(fun(X) -> X rem Cur =/= 0 end, All2),
    All3 = lists_filter(Cur,All2),
    sieve(Ref,[Cur|Primes], All3,  LastCheck).


lists_filter(Cur,All2) ->
    lists_filter(Cur,All2,[]).

lists_filter(V,[H|T],L) ->
    case H rem V of
    0 ->
        lists_filter(V,T,L);
    _ ->
        lists_filter(V,T,[H|L])
    end;
lists_filter(_,[],L) ->
    lists:reverse(L).


%% This is a sloppy implementation ;)
spawn_sieve(All,Last) ->
    %% split the job
    {L1,L2} = lists:split(round(length(All)/2),All),
    Filters = filters(All,Last),
    L3 = lists:append(Filters,L2),
    Pid = self(),
    Ref1=make_ref(),
    Ref2=make_ref(),
    erlang:spawn(?MODULE,sieve,[{Pid,Ref1},L1,Last]),
    erlang:spawn(?MODULE,sieve,[{Pid,Ref2},L3,Last]),
    Res1=receive
         {Ref1,R1} ->
      {1,R1};
         {Ref2,R1} ->
      {2,R1}
     end,
    Res2= receive
          {Ref1,R2} ->
       {1,R2};
          {Ref2,R2} ->
       {2,R2}
      end,
    apnd(Filters,Res1,Res2).


filters([H|T],Last) when H 
    [H|filters(T,Last)];
filters([H|_],_) ->
    [H];
filters(_,_) ->
    [].


apnd(Filters,{1,N1},{2,N2}) ->
    lists:append(N1,subtract(N2,Filters));
apnd(Filters,{2,N2},{1,N1}) ->
    lists:append(N1,subtract(N2,Filters)).



subtract([H|L],[H|T]) ->
    subtract(L,T);
subtract(L=[A|_],[B|_]) when A > B ->
    L;
subtract(L,[_|T]) ->
    subtract(L,T);
subtract(L,[]) ->
    L.

A: 

Crap, if you try to compile my code you will get an error:

 c(test).
./test.erl:71: syntax error before: '['
./test.erl:48: function filters/2 undefined
error

Line 71 should say when H is less than Last.

Thanks for your post. I'm still interested in the subject. I'll study your code when I have more time. BTW any fresh ideas how to convince my boss that Erlang is really great ? :) The dude is convinced that there is noting out there better then C++ for servers - wow:)
Roskoto
+1  A: 

Hi Roskoto, you could show your boss this: http://www.sics.se/~joe/apachevsyaws.html. And some other (classic?) erlang arguments are:

-nonstop operation, new code can be loaded on the fly.

-easy to debug, no more core dumps to analyse.

-easy to utilize multi core/CPUs

-easy to utilize clusters maybe?

-who wants to deal with pointers and stuff? Is this not the 21 century? ;)

Some pifalls: - it might look easy and fast to write something, but the performance can suck. If I want to make something fast I usually end up writing 2-4 different versions of the same function. And often you need to take a hawk eye aproach to problems which might be a little bit different from what one is used too.

  • looking up things in lists > about 1000 elements is slow, try using ets tables.

  • the string "abc" takes a lot more space than 3 bytes. So try to use binaries, (which is a pain).

All in all i think the performance issue is something to keep in mind at all times when writing something in erlang. The Erlang dudes need to work that out, and I think they will.

+1  A: 

The paper discussed here explains why the algorithm most people have been suggesting here is slow and not the classical Sieve, along with how to code the 'right' algorithm functionally, with Haskell code that could easily be ported to Erlang.

Darius Bacon
A: 

Have a look here to find 4 different implementations for finding prime numbers in Erlang (two of which are "real" sieves) and for performance measurement results:

http://caylespandon.blogspot.com/2009/01/in-euler-problem-10-we-are-asked-to.html

Cayle Spandon
+2  A: 

Here's my sieve implementation which uses list comprehensions and tries to be tail recursive. I reverse the list at the end so the primes are sorted:

primes(Prime, Max, Primes,Integers) when Prime > Max ->
    lists:reverse(Primes) ++ Integers;
primes(Prime, Max, Primes, Integers) ->
    [NewPrime|NewIntegers] = [ X || X <- Integers, X rem Prime =/= 0 ],
    primes(NewPrime, Max, [Prime|Primes], NewIntegers).

primes(N) ->
    primes(2, round(math:sqrt(N)), [], lists:seq(3,N,2)). % skip odds

Takes approx 2.8 ms to calculate primes up to 2 mil on my 2ghz mac.

turdfurguson
A: 

Simple enough, implements exactly the algorithm, and uses no library functions (only pattern matching and list comprehension). Not very powerful, indeed. I only tried to make it as simple as possible.

-module(primes).
-export([primes/1, primes/2]).

primes(X) -> sieve(range(2, X)).
primes(X, Y) -> remove(primes(X), primes(Y)).

range(X, X) -> [X];
range(X, Y) -> [X | range(X + 1, Y)].

sieve([X]) -> [X];
sieve([H | T]) -> [H | sieve(remove([H * X || X <-[H | T]], T))].

remove(_, []) -> [];
remove([H | X], [H | Y]) -> remove(X, Y);
remove(X, [H | Y]) -> [H | remove(X, Y)].
G B