views:

86

answers:

1

In working on a feed-reading iPhone app which displays nsdata's (html and pdf) in a UIWebView. I am hitting a snag in some PDF validation logic. I have an NSData object which I know contains a file with .pdf extension. I would like to restrict invalid PDFs from getting any further. Here's my first attempt at validation code, which seems to work for a majority of cases:

// pdfData is an NSData *
NSData *validPDF = [[NSString stringWithString:@"%PDF"] dataUsingEncoding: NSASCIIStringEncoding];
if (!(pdfData && [[pdfData subdataWithRange:NSMakeRange(0, 4)] isEqualToData:validPDF])) {
    // error
}

Unfortunately, a new pdf was uploaded a few days ago. It is valid in the sense that the UIWebView will display it fine, yet it fails my validation test. I have tracked down the issue to the fact that it was a bunch of garbage bytes at the beginning, with the %PDF coming midway through the 14th set of hex characters (the 25 or % is exactly the 54th byte):

%PDF: 25504446
Breaking PDF: 00010000 00ffffff ff010000 00000000 000f0100 0000b5e0 04000200 01000000 ffffffff 01000000 00000000 0f010000 0099e004 00022550 44462d31 etc...

What is the best practice for validating NSData to be a PDF?
What might be wrong with this particular PDF (it claims it was encoded by PaperPort 11.0, whatever that is)?

Thanks,

Mike

+2  A: 

This question seems quite helpful :

http://stackoverflow.com/questions/3108201/detect-if-pdf-file-is-correct-header-pdf

or, if you're feeling adventurous, here's the spec (from the Adobe site here)

deanWombourne
Yes, thanks for linking. It sucks that there is some fuzziness around the header. So do I just search the first 1024 bytes of the NSData for %PDF, 4 bytes at a time, or is there someway to search a chunk of data for a character string?
TahoeWolverine
You could turn the first 1024 chars into an NSString (using NSData's _subdatawithRange:_ to get the first 1024 bytes and then NSString's _initWithData:encoding:_ method to encode it into an NSString object) and do a search in that using rangeOfString: to see if it's in there?
deanWombourne
NB Make sure you've got >= 1024 bytes or at least one of those methods is going to throw an exception ;)
deanWombourne
Ignore my earlier comments, there is a much easier way; just use NSData's _rangeOfData:options:range:_ to search ;) Perhaps I shouldn't answer questions while I'm jet lagged and it's 3am!
deanWombourne
Hey I appreciate the help, Dean. I went ahead and did that and it worked, but I'm actually just going to stop validating, given the fact that hopefully my content editors will view their pdfs in advance.
TahoeWolverine