views:

392

answers:

5

This is a very general question but it's based on a specific problem. I've created a pdf reader app for the iPad and it works fine except for certain pdf pages which always crash the app. We now found out that the very same pages cause Safari to crash as well, so as I had started to suspect the problem is somewhere in Apple's pdf rendering code.

From what I have been able to see, the crashing pages cause the rendering libraries to start allocating memory like mad until the app is killed. I have nothing else to help me pinpoint what triggers this process.

It doesn't necessarily happen with the largest documents, or the ones with the most shapes. In fact, we haven't found any parameter that helps us predict which pages will crash and which not.

Now we just discovered that running the pages through a consumer program that lets you merge docs gets rid of the problem, but I haven't been able to detect which attribute or element it is that is the key. Changing documents by hand is also not an option for us in the long run. We need to run an automated process on our server.

I'm hoping someone with deeper knowledge about the pdf file format would be able to point me in a reasonable direction to look for document features that could cause this kind of behavior. All I've found so far is something about JBIG2 images, and I don't think we have any of those.

+1  A: 

It is not the PDF feature itself but the support for it which is the issue. You need to take the PDF apart and see what it contains - you can do this in Acrobat 9.0 - there is an article showing how you can use it to see inside the PDF at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

We were sent some PDFs which crashed Mail on OS X and the issue turned out to be the embedded, subsetted fonts.

mark stephens
Do you mean to say that it's the code that parses and renders the document that crashes rather than the file itself? Yes, I am aware of that. And we have been digging around a lot in the files, trying to disable this element or that. It doesn't seem to be the fonts in our case, but I'll look into it a bit more. Thanks.
Felixyz
A: 

Did you find a solution for this problem, Felixyz? We encountered the very same behaviour with just one page of a 130 pages document. Plus, one or two other pages take ages to render, but don't crash the app.

Tampinski
@Tampinski: Sorry for late response! We had to do a lot of trial and error to take apart and put together the files again. We managed to stop the crashes from happening, but we couldn't isolate exactly what caused them. Lots of vector shapes and gradients are likely suspects. My impression was that the pdf parser might enter some kind of loop, because it started to allocate memory like crazy, but this is just a theory. If you or anyone else finds out more, please continue on this thread!
Felixyz
+1  A: 

Same issue encountered with two 'special' PDFs that couldn't be rendered on an iPad app or Safari for iPad. In my case, the problem was isolated to some semi-transparent gradient shades.

By the way, converting the PDF to postscript, and then back to PDF again, seems to remove the internal elements that PDFKit doesn't like. The original document was 1.9 MB in size with lots of vector shapes, after the conversion process the file reduced in size to about 600 KB, and was rendered flawlessly on iPad.

soliosg
@soliosg: Thanks for this info. What program/library did you use to do the PS/PDF conversion? I suggested to my client to use Ghost Script do modify the files, but in the end they did their own solution.
Felixyz
No special open source programs/libraries. I used Adobe Distiller 6.0 as it was available at work.
soliosg
+1  A: 

We are experiencing similar problems and have found out that Tensor shading elements will definitely crash you app. Always! It is absolutely reproducable.

Gerard64
A: 

@Felixyz: "running the pages through a consumer program that lets you merge docs gets rid of the problem" -- could you kindly share which consumer program helped you to get rid of the malformed pages? Thanks.

Gerhard Miller