views:

52

answers:

2

I'm trying to implement file compression to an application. The application has been around for a while, so it needs to be able to read uncompressed documents written by previous versions. I expected that DeflateStream would be able to process an uncompressed file, but for GZipStream I get the "The magic number in GZip header is not correct" error. For DeflateStream I get "Found invalid data while decoding". I guess it does not find the header that marks the file as the type it is.

If it's not possible to simply process an uncompressed file, then 2nd best would be to have a way to determine whether a file is compressed, and choose the method of reading the file. I've found this link: http://blog.somecreativity.com/2008/04/08/how-to-check-if-a-file-is-compressed-in-c/, but this is very implementation specific, and doesn't feel like the right approach. It can also provide false positives (I'm sure this would be rare, but it does indicate that it's not the right approach).

A 3rd option I've considered is to attempt using DeflateStream, and fallback to normal stream IO if an exception occurs. This also feels messy, and causes VS to break at the exception (unless I untick that exception, which I don't really want to have to do).

Of course, I may simply be going about it the wrong way. This is the code I've tried in .Net 3.5:

Stream reader = new FileStream(fileName, FileMode.Open, readOnly ? FileAccess.Read : FileAccess.ReadWrite, readOnly ? FileShare.ReadWrite : FileShare.Read);

using (DeflateStream decompressedStream = new DeflateStream(reader, CompressionMode.Decompress))
{
    workspace = (Workspace)new XmlSerializer(typeof(Workspace)).Deserialize(decompressedStream);

    if (readOnly)
    {
        reader.Close();
        workspace.FilePath = fileName;
    }
    else
        workspace.SetOpen(reader, fileName);
}

Any ideas?

Thanks! Luke.

+1  A: 

Doesn't your file format have a header? If not, now is the time to add one (you're changing the file format by supporting compression, anyway). Pick a good magic value, make sure the header is extensible (add a version field, or use specific magic values for specific versions), and you're ready to go.

Upon loading, check for the magic value. If not present, use your current legacy loading routines. If present, the header will tell you whether the contents are compressed or not.

Update

Compressing the stream means the file is no longer an XML document, and thus there's not much reason to expect the file can't contain more than your data stream. You really do want a header identifying your file :)

The below is example (pseudo)-code; I don't know if .net has a "substream", SubRangeStream is likely something you'll have to code yourself (DeflateStream probably adds it's own header, so a substream might not be necessary; could turn out useful further down the road, though).

Int64 oldPosition = reader.Position;
reader.Read(magic, 0, magic.length);
if(IsRightMagicValue(magic))
{
    Header header = ReadHeader(reader);
    Stream furtherReader = new SubRangeStream(reader, reader.Position, header.ContentLength); 
    if(header.IsCompressed)
    {
        furtherReader = new DeflateStream(furtherReader, CompressionMode.Decompress);
    }

    XmlSerializer xml = new XmlSerializer(typeof(Workspace));
    workspace = (Workspace) xml.Deserialize(furtherReader); 
} else
{
    reader.Position = oldPosition;
    LegacyLoad(reader);
}

In real-life, I would do things a bit differently - some proper error handling and cleanup, for instance. Also, I wouldn't have the new loader code directly in the IsRightMagicValue block, but rather I'd spin off the work either based on the magic value (one magic value per file version), or I would keep a "common header" portion with fields common to all versions. For both, I'd use a Factory Method to return an IWorkspaceReader depending on the file version.

snemarch
I'm using XML serialisation, which has it's own format. (And yes, I have a version number within my XML document.) This has been used since day 1, so I can't tell whether a file is compressed. This approach would also suffer from false positives, the same as for the approach in the link in my question. Thanks though!
Luke
@Luke: XML serializing uses a *stream*, not a *file*. So, you can easily open you *file* and check for the header, then treat a subportion of your *file* as the *stream* for the XMLSerializer.
snemarch
@Luke: check my update for some further description and a bit of (pseudo)code.
snemarch
I see what you mean about the difference between file and stream. Yes, I could use a header with XML serialisation. I was hoping someone would say there's a DeflateStream.IsCompressedFile() function, or File.IsCompressed(), or tell me how to use deflatestream with an uncompressed file, but I guess there's no nice way to do what I want other than use a bespoke header to the files I use. Thanks for your help on this! I will mark this as the correct answer.
Luke
P.S. Regarding my comment about false positives: That would be extremely unlikely, unless I chose my header to match the standard header for XML files, which would be careless of me!
Luke
+1  A: 

Can't you just create a wrapper class/function for reading the file and catch the exception? Something like

try
{
    // Try return decompressed stream 
}
catch(InvalidDataException e)
{
    // Assume it is already decompressed and return it as it is
}
Svish
Yes, this is option 3 as described in my question. This is what I'm using until I find a more elegant approach (assuming there is one).
Luke
Guess that depends what you define as elegant. I find this one much more elegant and simpler than all the file-header stuff. Especially since you say you are using XML encoding with a version number in it already anyways.
Svish