tags:

views:

2365

answers:

7

Hi,

I need to determine the number of pages in a specified PDF file using C# code (.NET 2.0). The PDF file will be read from the file system, and not from a URL. Does anyone have any pointers on how this could be done? Note, Adobe Acrobat Reader is installed on the PC where this check will be carried out.

Thanks in advance for your answers.

Kind Regards, Andy.

+3  A: 

you'll need a PDF API for C#.. maybe iTextSharp

i think there are better ones...

darkdog
so are you saying "here's what I recommend, but actually there are betetr ways to do this"?
Mitch Wheat
Thank, Darkdog, after looking at PDFLib and iTextSharp, I ended up using iTextSharp: PdfReader pdfReader = new PdfReader(pdfFilePath); int numberOfPages = pdfReader.NumberOfPages;Hope this helps someone facing the same problem.
MagicAndi
Thanks MagicAndi for posting the code. Very useful
lidermin
+1  A: 

PDFsharp

this one should be better =)

darkdog
+2  A: 

i have used pdflib for this.

try {

    p = new pdflib();

    /* Open the input PDF */
    indoc = p.open_pdi_document("myTestFile.pdf", "");
    pageCount = (int) p.pcos_get_number(indoc, "length:pages");

}
catch (ExSomething { }

from
http://www.pdflib.com

Peter Gfader
+1  A: 

I have good success using CeTe Dynamic PDF products. They're not free, but are well documented. They did the job for me.

http://www.dynamicpdf.com/

Paul Lefebvre
+2  A: 

found a way at http://www.dotnetspider.com/resources/21866-Count-pages-PDF-file.aspx this does not require purchase of a pdf library

Rachael, finally reviewed this question, and checked out your link. Thanks, one to try next time this problem comes up! +1
MagicAndi
+1  A: 

This should do the trick:

    public int getNumberOfPdfPages(string fileName)
    {
        using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
        {
            Regex regex = new Regex(@"/Type\s*/Page[^s]");
            MatchCollection matches = regex.Matches(sr.ReadToEnd());

            return matches.Count;
        }
    }

From Rachel's answer and this one too.

Barrett
Barrett, thanks for providing example code. +1
MagicAndi
A: 

I've used the code above that solves the problem using regex and it works, but it's quite slow. It reads the entire file to determine the number of pages.

I used it in a web app and pages would sometimes list 20 or 30 PDFs at a time and in that circumstance the load time for the page went from a couple seconds to almost a minute due to the page counting method.

I don't know if the 3rd party libraries are much better, I would hope that they are and I've used pdflib in other scenarios with success.

Ryan Bennett
Ryan, I have used the iTextSharp library to solve this problem, and found it to give decent performance. You could also look at PDFSharp. As for the issues with the regex solution, it is another example of regular expressions causing more problems than they solve - http://www.codinghorror.com/blog/archives/001016.html
MagicAndi
Agreed. I didn't see your note until after, but I replaced the RegEx function with one using iTextSharp as you recommend and there was a huge improvement in performance. Based on my tests the iTextSharp method is at least 5x faster than the RegEx method and usually a lot more than that, at least when I'm calculating for a number of PDF files at the same time (i.e. loading a page with multiple PDFs listed).
Ryan Bennett