tags:

views:

448

answers:

5

Parsec is designed to parse textual information, but it occurs to me that Parsec could also be suitable to do binary file format parsing for complex formats that involve conditional segments, out-of-order segments, etc.

Is there an ability to do this or a similar, alternative package that does this? If not, what is the best way in Haskell to parse binary file formats?

+4  A: 

I've used Data Binary successfully.

volothamp
+4  A: 

You might be interested in AttoParsec, which was designed for this purpose, I think.

Chris Eidhof
+2  A: 

It works fine, though you might want to use Parsec 3, Attoparsec, or Iteratees. Parsec's reliance on String as its intermediate representation may bloat your memory footprint quite a bit, whereas the others can be configured to use ByteStrings.

Iteratees are particularly attractive because it is easier to ensure they won't hold onto the beginning of your input and can be fed chunks of data incrementally a they come available. This prevents you from having to read the entire input into memory in advance and lets you avoid other nasty workarounds like lazy IO.

Edward Kmett
That Parsec is in Text.Parsec implies it is primarily for text, not binary, right? Even for ByteStrings, it only makes the type an instance of Stream and Char. What do you mean it works fine?
me2
A: 

The best approach depends on the format of the binary file.

Many binary formats are designed to make parsing easy (unlike text formats that are primarily to be read by humans). So any union data type will be preceded by a discriminator that tells you what type to expect, all fields are either fixed length or preceded by a length field, and so on. For this kind of data I would recommend Data.Binary; typically you create a matching Haskell data type for each type in the file, and then make each of those types an instance of Binary. Define the "get" method for reading; it returns a "Get" monad action which is basically a very simple parser. You will also need to define a "put" method.

On the other hand if your binary data doesn't fit into this kind of world then you will need attoparsec. I've never used that, so I can't comment further, but this blog post is very positive.

Paul Johnson
+4  A: 

The key tools for parsing binary files are:

Binary is the most general solution, Cereal can be great for limited data sizes, and attoparsec is perfectly fine for e.g. packet parsing. All of these are aimed at very high performance, unlike Parsec. There are many examples on hackage as well.

Don Stewart
I think attoparsec is the way to go, but I'm having difficulty finding examples of parsing actual binary data. The only example is the RFC2616.hs included in the repository but that still parses text, not binary. Any suggestions?
me2
attoparsec is only recently suggested for parsing binary data. At work we go with Data.Binary and cereal. There's far more examples there, and that's what it is designed for. attoparsec may be more general than you need.
Don Stewart