



In working on a feed-reading iPhone app which displays nsdata's (html and pdf) in a UIWebView. I am hitting a snag in some PDF validation logic. I have an NSData object which I know contains a file with .pdf extension. I would like to restrict invalid PDFs from getting any further. Here's my first attempt at validation code, which seems to work for a majority of cases:

// pdfData is an NSData *
NSData *validPDF = [[NSString stringWithString:@"%PDF"] dataUsingEncoding: NSASCIIStringEncoding];
if (!(pdfData && [[pdfData subdataWithRange:NSMakeRange(0, 4)] isEqualToData:validPDF])) {
    // error

Unfortunately, a new pdf was uploaded a few days ago. It is valid in the sense that the UIWebView will display it fine, yet it fails my validation test. I have tracked down the issue to the fact that it was a bunch of garbage bytes at the beginning, with the %PDF coming midway through the 14th set of hex characters (the 25 or % is exactly the 54th byte):

%PDF: 25504446
Breaking PDF: 00010000 00ffffff ff010000 00000000 000f0100 0000b5e0 04000200 01000000 ffffffff 01000000 00000000 0f010000 0099e004 00022550 44462d31 etc...

What is the best practice for validating NSData to be a PDF?
What might be wrong with this particular PDF (it claims it was encoded by PaperPort 11.0, whatever that is)?



+2  A: 

This question seems quite helpful :

or, if you're feeling adventurous, here's the spec (from the Adobe site here)

Yes, thanks for linking. It sucks that there is some fuzziness around the header. So do I just search the first 1024 bytes of the NSData for %PDF, 4 bytes at a time, or is there someway to search a chunk of data for a character string?
You could turn the first 1024 chars into an NSString (using NSData's _subdatawithRange:_ to get the first 1024 bytes and then NSString's _initWithData:encoding:_ method to encode it into an NSString object) and do a search in that using rangeOfString: to see if it's in there?
NB Make sure you've got >= 1024 bytes or at least one of those methods is going to throw an exception ;)
Ignore my earlier comments, there is a much easier way; just use NSData's _rangeOfData:options:range:_ to search ;) Perhaps I shouldn't answer questions while I'm jet lagged and it's 3am!
Hey I appreciate the help, Dean. I went ahead and did that and it worked, but I'm actually just going to stop validating, given the fact that hopefully my content editors will view their pdfs in advance.