How can I read PDF content with the itextsharp with the Pdfreader class. My PDF may include Plain text or Images of the text.
Hi user221185,
check these links
http://www.dotnetspider.com/forum/156957-read-pdf-content-vb-net.aspx
http://jadn.co.uk/w/ReadPdfUsingCsharp.htm
http://forums.asp.net/p/1408202/3097463.aspx#3097463
below link contain tutorials of itextsharp.
http://itextsharp.sourceforge.net/tutorial/ch01.html
If you got solution from my answer then click my answer and vote me.thanx
You can't read and parse the contents of a PDF using iTextSharp like you'd like to.
From iTextSharp's SourceForge tutorial:
You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page.
What does this mean?
The pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. In each page there will probably be a number of 'Strings', but you can't reconstruct a phrase or a paragraph using these strings. There are probably a number of lines drawn, but you can't retrieve a Table-object based on these lines. In short: parsing the content of a PDF-file is NOT POSSIBLE with iText. Post your question on the newsgroup news://comp.text.pdf and maybe you will get some answers from people that have built tools that can parse PDF and extract some of its contents, but don't expect tools that will perform a bullet-proof conversion to structured text.