How can I get the number of color pages in a PDF file using C#?

views:

273

answers:

+1 Q:

How can I get the number of color pages in a PDF file using C#?

Given a PDF file with color and black & white pages, is there any way with C# to find out among the given pages which are color and which are black & white?

Short of parsing all the postscript content, probably not. There's no flag on a PDF page that says it is or is not b&w or color. So you'd have to check the color of every element placed on the page to figure out if it was color or not. I'm not sure what libraries exist for reading PDFs on C# but you would need one that will read all the elements.

Similarly, any images you have on the page would need to be checked for color and that is not simple. Color image formats can hold b&w images.

jmucchiello 2009-05-11 14:33:38

Check out http://csharp-source.net/open-source/pdf-libraries Any number of these should be up to the task.

codeelegance 2009-05-11 14:39:51

My point is you need to write a lot of code to read through the PDF. There are issues involving images, recurring headers and footers, etc.

jmucchiello 2009-05-11 15:15:06

Thanks for your replyI found that there are some color directives in PDF while searcing the net, but i could not understand that do you have any idea regarding the same?

2009-05-12 10:38:45

I may be wrong, but PDFs are based on postscript. PS is a stack based language that draws stuff on "pages". Directives in PS can jump to page locations, draw lines, circles and other shapes and can output strings of text in the current font face/size/color. Between all these actions, the color of the drawing implement can be altered. Text on the page does not need to drawn in any particular order. This means you need to parse the whole page looking for certain commands that involve color. You should probably look for a tutorial about PS to understand PDF construction at the level you need.

jmucchiello 2009-05-12 20:58:26

+1 A:

My recommendation is to render each page to an image and then check each pixel for RGB values not equal to each other. If R=G=B for each pixel then it's a grayscale image.

You could then perform actions (such as extracting a page to another document or printing the page) on the pages based on whether they are color pages or black and white pages, etc.

This can be achieved by using my companies PDF developer library, Quick PDF, or potentially by one of the open source PDF libraries that Kenneth suggested.

Rowan 2009-05-12 06:45:08

Check out:

PDF-Analyser

I use his tools for text extraction and pdf analysis. Very inexpensive, royalty free, and work well. I think GetPDFColourStyle as part of the PDFLayoutPlus library should do the trick.

Douglas Anderson 2009-05-12 22:00:05

ansaurus

tags:

views:

answers:

How can I get the number of color pages in a PDF file using C#?

related questions