views:

64

answers:

1

Hi,

Trying to use Outlook Interop in C#, I noticed a curious thing.

Comparing the real size of a saved file and a size given by Outlook, I notice that the real, saved file is always smaller than expected from Attachment.Size. The saved files seem to be valid and not truncated.

Sample results

So, what's wrong with it? Is there a bug in Attachment.Size? Or maybe it is expected to give something other than the size of an attachment?

I thought it converts CR to CRLF, including binary files, which may explain the overhead, but some attached files are in raw text format with CRLF, so this hypothesis is wrong.


First edit:

It is not Base64 encoding, because Base64 encoding would be:

  • 4/3 ratio. In my case, I have a ratio which is not so far from 1.0.
  • Proportional. It is not the case here: a 1.9 MB file has an overhead of 181 bytes, whereas a 27 KB file has an overhead of 3 KB.

Now, looking at nearly random overhead in a range of 89 to 3658 bytes, I would agree that it might be some strange headers.


Second edit:

I tested this on a larger set of files. What I notice is that the difference between real file size and size given by Outlook:

  • Is always zero for an .msg attachment. But .msg attachment is a very special case and have a very strange behavior.
  • Is influenced by both file extension and the length of file name.
  • For the same file extension, is, in most cases, but not always, bigger when the file name length is bigger.

Here is an example:

alt text

IMHO, Outlook does something with the name of the file, some sort of very strange encoding, maybe a generation of an unique identifier based on file name. This means that:

  • when the file is bigger, the unique identifier is bigger too.
  • when collision happens, something happens to the unique identifier, making it much, much bigger: row 18 has the same file name as row 11, but the file is not the same; on the other hand, rows 12, 13 and 14 have the same file.
+1  A: 

I'm not sure but I'd assume that it might be MIME headers and/or encoding overhead. For more information, look at this Wiki article about Base64 and search for the word overhead.

Edit: Sorry, I wasn't very clear, I meant the Base64 article just as an example of that there might be overhead related to encoding, not that it was actually Base64 since, as mentioned by others, Base64 overhead would probably be much larger than those differences.

ho1
Base64 overhead, i'd think, would be huge. Like 1/3 or more of the attachment's actual size. The differences in file sizes aren't anywhere near big enough to account for that, even #7 (the largest, by far). Add to that that the numbers are inconsistent, and it looks like something else is at work here. The headers could possibly do it, but it depends on what's special about #7.
cHao