tags:

views:

239

answers:

3

What I need is to read pdf, make some transformations (generate TOC bookmarks) and write it back.

I found this http://hackage.haskell.org/package/HPDF , but it only mentions generating pdf, not the parsing (although I could have missed it)

Haskell is chosen purely for (self)educational purposes.

A: 

Here's a haskell binding to parts of xpdf: http://hackage.haskell.org/package/pdf2line

ja
+1  A: 

There are a few tools for PDF manipulation, though they seem to bias towards generation, rather than parsing:

Pandoc is a great cross-markup library, but doesn't support PDF parsing (it does support PDF generation from a variety of formats).

There's also:

I'm not sure we have a good parsing tool yet.

Don Stewart
+1  A: 

Also as a learning exercise, I started a PDF parsing library in Haskell, but it's incomplete and has been languishing a bit from lack of attention. I'd be happy to share it with you, and would love feedback, improvements, etc. It's not currently hosted on hackage, but if you're interested in working with an incomplete implementation, let me know and I'll ask some colleagues for advice on getting it up there.

Dylan McNamee
I am far too junior for such a quest. But thanks anyway, I'll keep this in mind for future.
artemave
I'd be happy to work with you on it. Its current state is that it takes a PDF file and produces an AST-like representation, which can be manipulated. I've also got an AST pretty-printer that produces a valid PDF file.
Dylan McNamee
Also, I can't seem to comment on the "waah, the PDF ISO spec is expensive", but I found the free documents here: http://www.adobe.com/devnet/pdf/ to be sufficient for my PDF parsing needs.
Dylan McNamee