views:

342

answers:

2

Hi,

While parsing MIME using Erlang, I'm able to extract header, body and attachment. So now I have to parse all these parts separately.

Header structure:

Header-tag : header-value\n

Example:

Delivered-To: [email protected]\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account "mail");\n\tFri, 03 Jul 2009 16:56:03 +0530\n

so from above example i have to extract Delivered-To: [email protected] and Received: by 1.gnu.geodesic.net (fdm 1.5, account "mail");\n\tFri, 03 Jul 2009 16:56:03 +0530\n using some way to split with \n. But the second header's value contains \n\t so split stops there... I want a strict split which will split only with \n.

Thanks in advance.

+1  A: 

Incidentally, MIME headers are (almost?) the same as HTTP headers, so you can use Erlang's built-in HTTP decoding: (the data must be binary, not a string)

3> erlang:decode_packet(httph, <<"Delivered-To: [email protected]\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n">>, []).
{ok,{http_header,0,"Delivered-To",undefined,
                 "[email protected]"},
    <<"Received: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n">>}
4> Rest = element(3, v(-1)).

Right, got the first header in the http_header record, and the remaining data.

<<"Received: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n">>
5> erlang:decode_packet(httph, Rest, []).
{more,undefined}

But since the decoder can't know whether the header line continues on the next line without seeing the next line, this doesn't work. We need to add the final empty line:

6> erlang:decode_packet(httph, <<Rest/binary, "\r\n">>, []).
{ok,{http_header,0,"Received",undefined,
                 "by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530"},
    <<"\r\n">>}

And when that is all that's left, we get http_eoh:

7> erlang:decode_packet(httph, <<"\r\n">>, []).
{ok,http_eoh,<<>>}

Hope that helps…

legoscia
+1  A: 

Do you mean something like this?

split(String) ->
  split(String, [], []).


split([], [], Result) ->
  lists:reverse(Result);

split([], Buffer, [{Key}|Result]) ->
  split([], [], [{Key, lists:reverse(Buffer)}|Result]);

split("\n\t" ++ String, Buffer, Result) ->
  split(String, "\t\n" ++ Buffer, Result);

split("\n" ++ String, Buffer, [{Key}|Result]) ->
  split(String, [], [{Key, lists:reverse(Buffer)}|Result]);

split(": " ++ String, Buffer, Result) ->
  split(String, [], [{lists:reverse(Buffer)}|Result]);

split([C|String], Buffer, Result) ->
  split(String, [C|Buffer], Result).

Here is the result for your input header:

> split("Delivered-To: [email protected]\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530\n").
[{"Delivered-To","[email protected]"},
 {"Received",
  "by 1.gnu.geodesic.net (fdm 1.5, account \"mail\");\n\tFri, 03 Jul 2009 16:56:03 +0530"}]
Zed