views:

145

answers:

6

I created a .NET application years ago without thinking too hard about the file format: it uses a soap formatter to serialize our large hierarchy of objects. It was dirt simple to do, and so I didn't give it much thought.

I'm now trying to come up with a more optimal file format considering the following issue: When a file is saved, it ends up being converted to byte array and getting sent over the wire to a database for storage. This ends up being a big problem because you have all your objects in memory, then you allocate more memory for the serializer, and then you allocate even more memory for the byte array. Even modestly sized object graphs end up using a lot of memory to take care of saving the file.

I'm not sure how to improve this both from an file format perspective but also potentially from the perspective of the algorithm (objects -> stream -> byte array)

UPDATE: I'd always been zipping the byte array before sending it over the wire, so while that's good advice, it was already implemented in my application.

I did convert from Soap to Binary Serialization, and that has made a huge difference: our files are about 7x smaller than before. (Your mileage may vary, of course).

+2  A: 

one very quick solution if you haven't tried it. It's not going to completely reduce the overhead but will help.

When you serialize your objects, use attributes instead of nodes. There is a lot of wasted space using nodes. You can easily accomplish this by adding an [XmlAttribute] tag above the property / field.

Reference Link: http://msdn.microsoft.com/en-us/library/2baksw0z(VS.71).aspx

Cody C
+1  A: 

You could also try using a compressed/zipped stream, I think from memory SharpZipLib allows you to create compressed streams.

Kane
+4  A: 

If you need efficient serialization, and don't care if its serialized as a binary format, just use standard binary serialization in .NET. You can just decorate your serializable types with the [Serializable] attribute, and use the BinaryFormatter to serialize your objects to byte[].

jrista
A: 

Why not move the application away from XML to a JSON? There are a number of libraries that can serialize/deserialize JSON in .NET.

Richard Clayton
A: 

I have used LZMA for compression for data I store to the database. For example getting things from 36,000 to 6000. It is really simple to use, and you don't have to have the data stored just as a binary, it could be a string also.

David Basarab
+3  A: 

BinaryFormatter + DeflateStream = Compressed Persisted Objects

using System;
using System.IO;
using System.IO.Compression;
using System.Runtime.Serialization.Formatters.Binary;

namespace CompressedSerialized
{
    class Program
    {
        static void Main(string[] args)
        {
            var obj1 = new MyObject() { Prop1 = "p1", Prop2 = "p2" };
            MyObject obj2 = null;
            var bin = new BinaryFormatter();
            byte[] buffer = null;

            using (var ms = new MemoryStream())
            {
                using (var zip = new DeflateStream(ms, CompressionMode.Compress))
                {
                    bin.Serialize(zip, obj1);
                    zip.Flush();
                }
                buffer = ms.ToArray();
            }

            using (var ms = new MemoryStream(buffer))
            using (var unzip = new DeflateStream(ms, CompressionMode.Decompress))
            {
                var des = bin.Deserialize(unzip);
                obj2 = des as MyObject;
            }

        }
    }

    [Serializable]
    public class MyObject
    {
        public string Prop1 { get; set; }
        public string Prop2 { get; set; }
    }
}
Matthew Whited