views:

52

answers:

3

I know of several tools/libraries that can do this but I want to know if this is possible with just opening up the file as a text file and looking for a keyword.

A: 

The xpdf utilities package (called xpdf-utils in debian) includes an application called pdfinfo. It will print out the number of pages in the file, among other data.

http://www.linuxquestions.org/questions/programming-9/how-to-find-pdf-page-count-699113/

Gadolin
Sorry, not what I'm looking for. Edited my question's description to clarify further.
Chry Cheng
+2  A: 

have a look at this: http://www.freevbcode.com/ShowCode.asp?ID=8153
Edit: not work, may be too old
Found this:

public static int GetNoOfPagesPDF(string FileName)
        {
            int result = 0;
            FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read);
            StreamReader r = new StreamReader(fs);
            string pdfText = r.ReadToEnd();
            System.Text.RegularExpressions.Regex regx = new Regex(@"/Type\s*/Page[^s]");
            System.Text.RegularExpressions.MatchCollection matches = regx.Matches(pdfText);
            result = matches.Count;
            return result;
        }

Ps: tested! It works.see here source

pinichi
FYI - PDF can be written such that you can append changes to the document to the existing file, so if you "delete" pages by appending a new catalog with fewer pages (leaving the old pages in place), this solution will produce incorrect results.
plinth
+1  A: 

[Edit: based on the edited question]

It is possible by reading it as text file and some minimal parsing.

If you read the pdf yourself then you will need to do the parsing. Each page in a PDF is represented by a page object.

The following provides an understanding about the pdf specification in short for pages and the link to the pdf spec.

pyfunc
Preferring pinichi's answer as it has working code. Voting up your answer because it's very helpful.
Chry Cheng