tags:

views:

232

answers:

6

i want to parse pdf file without using any software or libraries such as itextsharp...i am developing a project in c#..

+1  A: 

I don't think this is a practical idea. PDF is a complicated file format, and writing a parser from scratch will almost certainly be more effort than it's worth.

ctford
+1  A: 

If you don't want to use 3rd party libraries, you'll have to implement the exact same thing yourself. I can't think of any rational reason you'd want to do that, unless it's purely a learning exercise...

http://www.pdf-tools.com/asp/pdf-specifications.asp

Good luck :)

rjohnston
A: 

I agree with ctford, but anyway, if you want to do a PDF parser from scratch you need to start by looking at the PDF format specification. It can be downloaded from here, anyway I don't know if it is the most up-to-date documentation: http://www.pdf-tools.com/asp/pdf-specifications.asp

Konamiman
A: 

Well if you read the PDF file format specs and know C# then do it? What do expect that someone is gonna paste code here that will solve you question ho man !! There are vendors who already did it so why waste your time and money.Check O2 solutions.

abmv
+1  A: 

According to the people at iText, parsing a PDF properly is very difficult.

If you are looking to find certain text or something like that, it may be workable.

I have had success changing JavaScript code in PDF's by doing search/find replaces.

The biggest problem I believe with PDF's is that all of the objects are indexed rather than (necessarily) inline. As an added problem when you save (not Save As) in acrobat the objects that have changes are appended on to the file leaving the previous version of the objects. That is why it is difficult.

Tom Hubbard
+1  A: 

I don't see a question here

A bigger and better question for you @Rahul is:

Do you really want to reinvent the wheel and spend lots of time (and consequently money) developing something others have done already?

They also spent lots of time debugging their code. Think of the time costs that you will put into this compared to 3rd party parser cost. I don't really see a point in your doing, unless you actually think you have a much better solution up your sleeve that will make you a nice revenue at the end.

And if you're just looking for code, check other assemblies related to this using tools like .Net Reflector. But code copying may not be permitted (check their licenses).

Robert Koritnik