tags:

views:

260

answers:

1

If I have a PDF file with objects (text, line-art) in a specific RGB color, and I want to convert those objects to have a specific CMYK color, what libraries are available that will let me do that?

Note that I don't need the ability to "map" from arbitrary RGB values to a "suitable" CMYK value - those values are pre-determined.

.NET/C# preferred, but I'd consider pretty much anything.

Obviously I'd prefer free/open libraries to paid ones, but depending on the licensing model I would consider paid ones too.

+1  A: 

Honestly? This is incredibly non-trivial.

PDF rendering is done through programs that describe what will be rendered in sequence. There is a graphics state which accumulates the changes that are made by the program as well as marks a page.

There are a number of different ways that colors can be set. Hopefully your PDF documents only use the operators RG and rg which set RGB colors for stroking and non-stroking operations. This means that color operations will be in the form:

rf gf bf RG

where rf, gf, and bf are floating point numbers representing color channel intensities from 0.0 to 1.0.

It would be a matter of rewriting all the RG and rg operators to use K and k, respectively, which will use 4 channel CMYK.

This, in itself, is challenging in that you would have to read in the document/page that you want, parse the content stream and rewrite a new one that will replace the old one (again, possible but not trivial - PDF allows you to replace individual objects like the content stream by appending a new generation onto the file). Don't think about using SED. PDF is file-layout dependent and changing something inline without maintaining the same length will break the PDF.

The real problem will happen if the file uses the CS and cs operators. Consider this sequence of operations:

/DeviceRGB CS 1 0 0 SC 0 0 m 200 200 l S 200 200 m 200 0 l 0 1 0 SC S

This means set the color space to DeviceRGB, set the color to red, move to (0, 0), line to (200, 200), stroke (in red), move to (200, 200), line to (200, 0), set color to green and stroke.

This is not so simple - if you wanted to change RGB red to CMYK yellow, you could do this:

/DeviceCMYK CS 0 0 1 0 SC 0 0 m 200 200 l S 200 200 m 200 0 l 0 1 0 SC S

which will work for yellow, but will break the attempt to set to green since the CS command now demands 4 channels.

What you need to do is interpret the content stream, keeping track of what the current color space is and when a CS command comes in that has the color you want to change, you need to replace that with /DeviceCMYK CS c m y k SC and then the next r g b SC command needs to change to /DeviceRGB CS r g b SC.

This doesn't take into account how to handle ICC based color spaces, grays, LAB, n-channel, colormapped, patterns, forms etc.

PDF was not made for editing.

If I was tasked with making this happen, I would do the following:

  1. If it was for less than 10 files, I would open them up in Illustrator, change the colors and export in PDF
  2. If it was for 10 or more and less than 1000, I would hire a temp worker to do what I did in step 1.
  3. If it was 1000 or more and less than 10000, I would write a program to script Illustrator to make those changes, if possible.
  4. If it was 10000 or more and ongoing, I'd have a serious talk with management about document production, because if changes like this need to be made on a terminal file format and they can't be regenerated correctly.
plinth
@Mr. Hawley: I understand that it's non-trivial, which is why I'm looking for a library :-) There *are* libraries out there that will read a PDF and provide a high-level API to modify it (or rather, to write out a modified version). The iText library is one example, which I use quite a bit. Unfortunately, that can't do the sort of manipulation I'm looking for here. Another one I've looked at is PdfTron - which *can* do this sort of thing - but I don't like their licensing model (and also, it's too heavyweight; I only need this one function).
Gary McGill
Still, thanks for taking the time to put together this well thought-out answer. +1
Gary McGill