views:

130

answers:

3

I'm using binary serialization (BinaryFormatter) as a temporary mechanism to store state information in a file for a relatively complex (game) object structure; the files are coming out much larger than I expect, and my data structure includes recursive references - so I'm wondering whether the BinaryFormatter is actually storing multiple copies of the same objects, or whether my basic "number of objects and values I should have" arithmentic is way off-base, or where else the excessive size is coming from.

Searching on stack overflow I was able to find the specification for Microsoft's binary remoting format: http://msdn.microsoft.com/en-us/library/cc236844(PROT.10).aspx

What I can't find is any existing viewer that enables you to "peek" into the contents of a binaryformatter output file - get object counts and total bytes for different object types in the file, etc;

I feel like this must be my "google-fu" failing me (what little I have) - can anyone help? This must have been done before, right??


UPDATE: I could not find it and got no answers so I put something relatively quick together (link to downloadable project below); I can confirm the BinaryFormatter does not store multiple copies of the same object but it does print quite a lot of metadata to the stream. If you need efficient storage, build your own custom serialization methods.

+1  A: 

Our application operates massive data. It can take up to 1-2 GB of RAM, like your game. We met same "storing multiple copies of the same objects" problem. Also binary serialization stores too much meta data. When it was first implemented the serialized file took about 1-2 GB. Nowadays I managed to decrease the value - 50-100 MB. What did we do.

The short answer - do not use the .Net binary serialization, create your own binary serialization mechanism. We have own BinaryFormatter class, and ISerializable interface (with two methods Serialize, Deserialize).

Same object should not be serialized more than once. We save it's unique ID and restore the object from cache.

I can share some code if you ask.

EDIT: It seems you are correct. See the following code - it proves I was wrong.

[Serializable]
public class Item
{
    public string Data { get; set; }
}

[Serializable]
public class ItemHolder
{
    public Item Item1 { get; set; }

    public Item Item2 { get; set; }
}

public class Program
{
    public static void Main(params string[] args)
    {
        {
            Item item0 = new Item() { Data = "0000000000" };
            ItemHolder holderOneInstance = new ItemHolder() { Item1 = item0, Item2 = item0 };

            var fs0 = File.Create("temp-file0.txt");
            var formatter0 = new BinaryFormatter();
            formatter0.Serialize(fs0, holderOneInstance);
            fs0.Close();
            Console.WriteLine("One instance: " + new FileInfo(fs0.Name).Length); // 335
            //File.Delete(fs0.Name);
        }

        {
            Item item1 = new Item() { Data = "1111111111" };
            Item item2 = new Item() { Data = "2222222222" };
            ItemHolder holderTwoInstances = new ItemHolder() { Item1 = item1, Item2 = item2 };

            var fs1 = File.Create("temp-file1.txt");
            var formatter1 = new BinaryFormatter();
            formatter1.Serialize(fs1, holderTwoInstances);
            fs1.Close();
            Console.WriteLine("Two instances: " + new FileInfo(fs1.Name).Length); // 360
            //File.Delete(fs1.Name);
        }
    }
}

Looks like BinaryFormatter uses object.Equals to find same objects.

Have you ever looked inside the generated files? If you open "temp-file0.txt" and "temp-file1.txt" from the code example you'll see it has lots of meta data. That's why I recommended you to create your own serialization mechanism.

Sorry for being cofusing.

Vasiliy Borovyak
thanks - I will ultimately use XML serialization (probably) because human-readability is pretty important to me - but really my goal/question right now is how is to understand what the BinaryFormatter class is actually storing, so that I can determine whether to focus on implementing my own serialization OR address some other design issue in the data structure itself. I'd like to know what's in the file! :)
Tao
Sure BinaryFormatter stores multiple copies of the same objects. I checked that some time ago. It really does.
Vasiliy Borovyak
Fair enough; I'll hold out for a couple of days to see whether anyone finds/knows of a way to view stats on the contents of the stream as I intended, and otherwise will look into building something.
Tao
Sorry, just one more note on this topic - as far as I could tell (in a hierarchical structure with backreferences, a few thousand objects of various types summing to 10MB serialized) BinaryFormatter does NOT store multiple copies of the same object; I'd be interested to see any evidence to the contrary...
Tao
I've published some code to prove I was wrong.
Vasiliy Borovyak
A: 

Maybe you could run your program in debug mode and try adding a control point.

If that is impossible due to the size of the game or other dependencies you can always coade a simple/small app that includes the deserialization code and peek from the debug mode there.

Juan Nunez
Sorry, I don't understand what you mean... I can debug and look at the resulting deserialized object in VS, but that doesn't tell me anything about how many objects of what type were in the serialized stream, what was duplicated, etc (or does it? Am I missing something?)
Tao
My point is, if you can debug you can see the exact state of the serialization stream, the object passed for serialization and its contents.
Juan Nunez
I'm sorry if I'm being dull here, or missing some major feature of VS, but as I don't have access to the internals of the BinaryFormatter, adding a breakpoint will simply allow be to see the stream, and then see the deserialized object; of course, I have the contents of the stream as a file, and I had the object before I even serialized it so this doesn't give me anything useful. My problem is not being able to "explore" the object itself through the VS IDE, I'd like to understand the structure of the stream data and space usage within it.
Tao
I'm sorry, I really did understand wrongly your question. I assume there's not such major feature. My bad.
Juan Nunez
+1  A: 

Vasiliy is right in that I will ultimately need to implement my own formatter/serialization process to better handle versioning and to output a much more compact stream (before compression).

I did want to understand what was happening in the stream, however, so I wrote up a (relatively) quick class that does what I wanted:

  • parses its way through the stream, building a collections of object names, counts and sizes
  • once done, outputs a quick summary of what it found - classes, counts and total sizes in the stream

It's not useful enough for me to put it somewhere visible like codeproject, so I just dumped the project in a zip file on my website: http://www.architectshack.com/BinarySerializationAnalysis.ashx

In my specific case it turns out that the problem was twofold:

  • The BinaryFormatter is VERY verbose (this is known, I just didn't realize the extent)
  • I did have issues in my class, it turned out I was storing objects that I didn't want

Hope this helps someone at some point!

Tao
Make all your enums as bytes (public MyEnumName : byte) - you'll save some more space.
Vasiliy Borovyak
thx! didn't know about that!
Tao