I know of several tools/libraries that can do this but I want to know if this is possible with just opening up the file as a text file and looking for a keyword.
A:
The xpdf utilities package (called xpdf-utils in debian) includes an application called pdfinfo. It will print out the number of pages in the file, among other data.
http://www.linuxquestions.org/questions/programming-9/how-to-find-pdf-page-count-699113/
Gadolin
2010-10-05 06:50:13
Sorry, not what I'm looking for. Edited my question's description to clarify further.
Chry Cheng
2010-10-05 06:52:20
+2
A:
have a look at this: http://www.freevbcode.com/ShowCode.asp?ID=8153
Edit: not work, may be too old
Found this:
public static int GetNoOfPagesPDF(string FileName)
{
int result = 0;
FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string pdfText = r.ReadToEnd();
System.Text.RegularExpressions.Regex regx = new Regex(@"/Type\s*/Page[^s]");
System.Text.RegularExpressions.MatchCollection matches = regx.Matches(pdfText);
result = matches.Count;
return result;
}
pinichi
2010-10-05 06:52:53
FYI - PDF can be written such that you can append changes to the document to the existing file, so if you "delete" pages by appending a new catalog with fewer pages (leaving the old pages in place), this solution will produce incorrect results.
plinth
2010-10-11 17:30:26
+1
A:
[Edit: based on the edited question]
It is possible by reading it as text file and some minimal parsing.
If you read the pdf yourself then you will need to do the parsing. Each page in a PDF is represented by a page object.
The following provides an understanding about the pdf specification in short for pages and the link to the pdf spec.
pyfunc
2010-10-05 06:53:07
Preferring pinichi's answer as it has working code. Voting up your answer because it's very helpful.
Chry Cheng
2010-10-05 07:19:53