views:

1693

answers:

2

How can I read PDF content with the itextsharp with the Pdfreader class. My PDF may include Plain text or Images of the text.

+1  A: 

Hi user221185,

check these links

http://www.dotnetspider.com/forum/156957-read-pdf-content-vb-net.aspx

http://jadn.co.uk/w/ReadPdfUsingCsharp.htm

http://forums.asp.net/p/1408202/3097463.aspx#3097463

below link contain tutorials of itextsharp.

http://itextsharp.sourceforge.net/tutorial/ch01.html

If you got solution from my answer then click my answer and vote me.thanx

Emaad Ali
that's not really an answer, more of a "here is some info, work it out yourself"
Gordon Carpenter-Thompson
+4  A: 

You can't read and parse the contents of a PDF using iTextSharp like you'd like to.

From iTextSharp's SourceForge tutorial:

You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page.

What does this mean?

The pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. In each page there will probably be a number of 'Strings', but you can't reconstruct a phrase or a paragraph using these strings. There are probably a number of lines drawn, but you can't retrieve a Table-object based on these lines. In short: parsing the content of a PDF-file is NOT POSSIBLE with iText. Post your question on the newsgroup news://comp.text.pdf and maybe you will get some answers from people that have built tools that can parse PDF and extract some of its contents, but don't expect tools that will perform a bullet-proof conversion to structured text.

Jay Riggs