tags:

views:

408

answers:

2

I want a python function that takes a pdf and returns a list of the text of the note annotations in the document. I have looked at python-poppler (https://code.launchpad.net/~poppler-python/poppler-python/trunk) but I can not figure out how to get it to give me anything useful.

I found the get_annot_mapping method and modified the demo program provided to call it via self.current_page.get_annot_mapping(), but I have no idea what to do with an AnnotMapping object. It seems to not be fully implemented, providing only the copy method.

If there are any other libraries that provide this function, that's fine as well.

+1  A: 

I didn't ever used this, nor I wanted this kind of features, but I found PDFMiner - this link has information about basic usage, maybe this is what You are looking for?

zeroDivisible
While that might be useful if I wanted to extract all of the text from a pdf, I just want to extract the annotations. The reason I mentioned poppler is because it does provide this ability rather easily (http://cgit.freedesktop.org/poppler/poppler/tree/glib/poppler-annot.h).But, I wanted to use python. I found the python-poppler binding project, but it does not seems to provide full access to the annotations. My question kind of boils down to "Am I doing it wrong or is the library incomplete?" and "Are there any others that provide the same functionality?"
davidb
+1  A: 

Turns out the bindings were incomplete. It is now fixed. https://bugs.launchpad.net/poppler-python/+bug/397850

davidb