views:

105

answers:

4

I have some C# code that will upload images and files into a db. Currently I thought It working for all documents that fit a list of mimetypes. However It fails on some pdf's.

I have narrowed the problem down to the fact that some pdf's are in the 1.3 format and some are in the 1.4 format. The 1.4 works and is properly uploaded, however 1.3 does not upload and it does not generate any runtime errors it just fails to be added.

Some of the current code for uploading the pdf is

Checks for valid MIME Type
...

byte[] fileData = new byte[uploadFile.ContentLength];
uploadFile.InputStream.Read(fileData, 0, uploadFile.ContentLength);

...
Continues on to Uploads to db.

For pdf's it is looking for "application/pdf" as the mime type. I don't think there is another type for pdfs in the 1.3 format, but maybe I am wrong.

Anyone else ever have this problem before and any advice on how to correct it?

+1  A: 

The problem may be the way you're reading from the input stream.

Whenever you deal with streams, you should read repeatedly, taking note of the return value on each iteration. So your original code should be:

byte[] fileData = new byte[uploadFile.ContentLength];
int totalRead = 0;
while (totalRead < fileData.Length)
{
    int read = uploadFile.InputStream.Read(fileData, totalRead,
                                           fileData.Length - totalRead);
    if (read == 0)
    {
        throw new IOException("Input data was truncated");
    }
    totalRead += read;
}

However, that may not be the problem. I'd expect that to result in truncated data, not a complete absence of data. When you say it "just fails to be added" could you be more specific? How much logging have you put in? Where's the code which actually inserts the data into the database? What mime type do your logs show for the cases where it's failing?

It sounds to me like extra logging would probably make a huge difference here... currently either you don't know where it's going wrong, or you just haven't told us. Logging should make that quite clear.

Jon Skeet
A: 

I agree with Jon Skeet's answer to this question. The difference is probably because Adobe added a handful of new compression techniques to PDF 1.4, so your 1.4 PDFs could be significantly smaller than the 1.3 PDFs. So, the need to read in a loop may only manifest for v1.3 PDFs as a consequence. (but that's just a guess)

Chris Dolan
A: 

Maybe it is not the version of the pdf. Didn't Adobe introduced linearized pdfs in 1.4?

Data is available immediately with the "fast web view" pdf and data is not available until the upload is finished with a standard pdf. If you try to write to the db before the transfer is finished it might work with a linearized pdf and not with a standard one.

Either way, Jon Skeet is right. Logging or properly placed breakpoints will tell you.

R Ubben
A: 

It seems that it was just a stupid mistake. The file had a .PDF extension unlike the rest with a .pdf. Stupid caps got me. A bit of extra debug statements did the trick.

corymathews