In working on a feed-reading iPhone app which displays nsdata's (html and pdf) in a UIWebView. I am hitting a snag in some PDF validation logic. I have an NSData object which I know contains a file with .pdf extension. I would like to restrict invalid PDFs from getting any further. Here's my first attempt at validation code, which seems to work for a majority of cases:
// pdfData is an NSData *
NSData *validPDF = [[NSString stringWithString:@"%PDF"] dataUsingEncoding: NSASCIIStringEncoding];
if (!(pdfData && [[pdfData subdataWithRange:NSMakeRange(0, 4)] isEqualToData:validPDF])) {
// error
}
Unfortunately, a new pdf was uploaded a few days ago. It is valid in the sense that the UIWebView will display it fine, yet it fails my validation test. I have tracked down the issue to the fact that it was a bunch of garbage bytes at the beginning, with the %PDF coming midway through the 14th set of hex characters (the 25 or % is exactly the 54th byte):
%PDF: 25504446
Breaking PDF: 00010000 00ffffff ff010000 00000000 000f0100 0000b5e0 04000200 01000000 ffffffff 01000000 00000000 0f010000 0099e004 00022550 44462d31 etc...
What is the best practice for validating NSData to be a PDF?
What might be wrong with this particular PDF (it claims it was encoded by PaperPort 11.0, whatever that is)?
Thanks,
Mike