ansaurus

Question

Unexpected behavior of io:fread in Erlang

Answer 1

A:

I noticed that there are multiple instances where two numbers are merged, and it appears to be at the line boundaries on every line starting at the fourth line and beyond.

I found that if you add a whitespace character to the beginning of every line starting at the fifth, that is:

59
73 41
52 40 09
26 53 06 34
 10 51 87 86 81
 61 95 66 57 25 68
 90 81 80 38 92 67 73
 30 28 51 76 81 18 75 44
...

The numbers get parsed properly:

39> euler67:solve().
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
 68,90,81,80,38,92,67,73,30|...]

It also works if you add the whitespace to the beginning of the first four lines, as well.

It's more of a workaround than an actual solution, but it works. I'd like to figure out how to set up the format string for io:fread such that we wouldn't have to do this.

UPDATE Here's a workaround that won't force you to change the file. This assumes that all digits are two characters (< 100):

read_file(File, Data) ->
case io:fread(File, "", "~d") of
    {ok, [N] } -> 
        if
            N > 100 ->
                First = N div 100,
                Second = N - (First * 100),
                read_file(File, [First , Second | Data]);

            true ->
                read_file(File, [N | Data])
        end;
    eof ->
        lists:reverse(Data)
end.

Basically, the code catches any of the numbers which are the concatenation of two across a newline and splits them into two.

Again, it's a kludge that implies a possible bug in io:fread, but that should do it.

UPDATE AGAIN The above will only work for two-digit inputs, but since the example packs all digits (even those < 10) into a two-digit format, that will work for this example.

Vector 2009-01-28 15:47:51

Thanks! That helps. If you find a format string that works, that would certainly be helpful too. But just to make sure we are on the same page: do you think that the format string which I am using now (namely "~d") should work with my original file? In other words: there is a bug in io:fread?

Cayle Spandon 2009-01-28 20:12:08

I see no reason why it shouldn't work with the original file, but I'm still somewhat new to Erlang, so I might be missing something. It certainly could be a bug, but I'm not sure at this point.

Vector 2009-01-28 20:47:57

Your workaround also assumes that all numbers are exactly two digits -- it won't work for 1\n2, as that'll come up as 12, and won't be split.

womble 2009-01-30 23:15:15

Agreed, but note that the example input lists everything as two-digit (note the 09 on line 3 and 06 on line 4)

Vector 2009-01-31 13:37:37

Answer 2

+7 A:

I'm going to go with it being a bug in Erlang, too, and a weird one. Changing the format string to "~2s" gives equally weird results:

["59","73","4","15","2","40","0","92","6","53","0","6","34",
 "10","5","1","87","8","6","81","61","9","5","66","5","7",
 "25","6",
 [...]|...]

So it appears that it's counting a newline character as a regular character for the purposes of counting, but not when it comes to producing the output. Loopy as all hell.

A week of Erlang programming, and I'm already delving into the source. That might be a new record for me...

EDIT

A bit more investigation has confirmed for me that this is a bug. Calling one of the internal methods that's used in fread:

> io_lib_fread:fread([], "12 13\n14 15 16\n17 18 19 20\n", "~d").           
{done,{ok,"\f"}," 1314 15 16\n17 18 19 20\n"}

Basically, if there's multiple values to be read, then a newline, the first newline gets eaten in the "still to be read" part of the string. Other testing suggests that if you prepend a space it's OK, and if you lead the string with a newline it asks for more.

I'm going to get to the bottom of this, gosh-darn-it... (grin) There's not that much code to go through, and not much of it deals specifically with newlines, so it shouldn't take too long to narrow it down and fix it.

EDIT^2

HA HA! Got the little blighter.

Here's the patch to the stdlib that you want (remember to recompile and drop the new beam file over the top of the old one):

--- ../erlang/erlang-12.b.3-dfsg/lib/stdlib/src/io_lib_fread.erl
+++ ./io_lib_fread.erl
@@ -35,9 +35,9 @@
     fread_collect(MoreChars, [], Rest, RestFormat, N, Inputs).

 fread_collect([$\r|More], Stack, Rest, RestFormat, N, Inputs) ->
-    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\r|More]);
 fread_collect([$\n|More], Stack, Rest, RestFormat, N, Inputs) ->
-    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\n|More]);
 fread_collect([C|More], Stack, Rest, RestFormat, N, Inputs) ->
     fread_collect(More, [C|Stack], Rest, RestFormat, N, Inputs);
 fread_collect([], Stack, Rest, RestFormat, N, Inputs) ->
@@ -55,8 +55,8 @@
                eof ->
                    fread(RestFormat,eof,N,Inputs,eof);
                _ ->
-                   %% Don't forget to count the newline.
-                   {more,{More,RestFormat,N+1,Inputs}}
+                   %% Don't forget to strip and count the newline.
+                   {more,{tl(More),RestFormat,N+1,Inputs}}
            end;
        Other ->                                %An error has occurred
            {done,Other,More}

Now to submit my patch to erlang-patches, and reap the resulting fame and glory...

womble 2009-01-29 00:22:10

Looks like you are getting warmer. Thanks for taking the time for getting to the bottom of this.

Cayle Spandon 2009-01-29 14:13:06

Answer 3

+1 A:

Hi,

Besides the fact that it seems to be a bug in one of the erlang libs I think you could (very) easily circumvent the problem.

Given the fact your file is line-oriented I think best practice is that you process it line-by-line as well.

Consider the following construction. It works nicely on an unpatched erlang and because it uses lazy evaluation it can handle files of arbitrary length without having to read all of it into memory first. The module contains an example of a function to apply to each line - turning a line of text-representations of integers into a list of integers.


-module(liner).
-author("Harro Verkouter").
-export([liner/2, integerize/0, lazyfile/1]).

% Applies a function to all lines of the file
% before reducing (foldl).
liner(File, Fun) ->
    lists:foldl(fun(X, Acc) -> Acc++Fun(X) end, [], lazyfile(File)).

% Reads the lines of a file in a lazy fashion
lazyfile(File) ->
    {ok, Fd} = file:open(File, [read]),
    lazylines(Fd).
% Actually, this one does the lazy read ;)
lazylines(Fd) ->
    case io:get_line(Fd, "") of
        eof -> file:close(Fd), [];
        {error, Reason} ->
            file:close(Fd), exit(Reason);
        L ->
            [L|lazylines(Fd)]
    end.

% Take a line of space separated integers (string) and transform
% them into a list of integers
integerize() ->
    fun(X) ->
        lists:map(fun(Y) -> list_to_integer(Y) end,
                string:tokens(X, " \n")) end.


Example usage:
Eshell V5.6.5  (abort with ^G)
1> c(liner).
{ok,liner}
2> liner:liner("triangle.txt", liner:integerize()).
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
 68,90,81,80,38,92,67,73,30|...]

And as a bonus, you can easily fold over the lines of any (lineoriented) file w/o running out of memory :)

6> lists:foldl( fun(X, Acc) -> 
6>                  io:format("~.2w: ~s", [Acc,X]), Acc+1
6>                  end,
6>              1,  
6>              liner:lazyfile("triangle.txt")).                                        
 1: 59
 2: 73 41
 3: 52 40 09
 4: 26 53 06 34
 5: 10 51 87 86 81
 6: 61 95 66 57 25 68
 7: 90 81 80 38 92 67 73
 8: 30 28 51 76 81 18 75 44

Cheers, h.

haavee 2009-03-27 10:21:00

ansaurus

tags:

views:

answers:

Unexpected behavior of io:fread in Erlang

related questions