views:

698

answers:

4

I want to be able to read the content of pdf files. I need to do that with C on Linux.

The closer i can get to this was here but I think Haru can only create pdf and is not able to read them (not 100% sure).

PS: I only need the plain text from pdf

+1  A: 

How well do you need to parse them? Just extracting strings should be relatively easy, fully accurate rendering is harder. Take a look at the source for evince or ghostscript?

This is for C++ but might be a good starting point for understanding PDF structure http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx (sorry wrong link before)

Martin Beckett
I only need the plain text from pdf files.
Rui Carneiro
I don't believe this will work for C
TStamper
Sorry pasted wrong link - had too many windows open!
Martin Beckett
+2  A: 

Check out libpoppler. I've never used it work extracting text, just querying PDF attributes. It's pretty easy to use.

eduffy
I think libpoppler is too "big" for what i want. It uses QT and other stuff that i think it is unnecessary.
Rui Carneiro
Poppler has optional frontends for glib and Qt (to fit nicely into their object systems), but is not required.
eduffy
Ok! I saw that it is already on Ubuntu repositories. I will take a look.
Rui Carneiro
A: 

Another possible, though I've never used it is VersyPDF. It claims to allow you to edit PDFs ... http://versypdf.sybrex-systems-ltd.qarchive.org/

I forgot to mention that working on Linux is mandatory.
Rui Carneiro
A: 

Hi every one,

Iam c application programmer,I want read opened pdf files through C Code,Iam able to read opened text and jpg files but not able read opened pdf files. I am able to read not opened pdf files but when iam reading pdf file through C code at that time if manualy iam trying to open pdf file.I can't open PDf file.

Plz suggest me solution for above problem,,,,,,

Thanks in advance

regards Sunil Kumar G

I think you should ask a question and not writing it on a comment. Try this: http://stackoverflow.com/questions/ask
Rui Carneiro