views:

465

answers:

4

I am using IsolatedStorage in Silverlight 3 to store some settings when a user navigates away from a page hosting the application.

Currently i'm using a DataContractSerializer to write the settings to file. In some circumstances the resulting file is quite large, over 10MB (a lot of this size is due to the serializer itself and the XML it generates). This produces problems because

  • i have to request the extra space from the user
  • it is really slow writing the data to file

can anyone share some strategies they have used for dealing with larger files in IsolatedStorage?

  • how do you determine the likely amount of disk space you will need?
  • do you use a DataContract or Xml Serializer and then zip the result before saving?
  • or do you use some sort of binary/custom serialization? If so, did you gain any substantial space or time savings?
  • is there some way of declaratively saying your application requires a certain quota, so that the user doesn't have to be prompted at some arbitrary point?

I personally don't like writing large quantities of data to file like this, but i need to know all the available options before i explain the issues to a product manager and persuade them to change the requirements.

Thanks!

+4  A: 

slugster,

You may want to consider switching over to XMLSerializer instead. Here is what I have determined over time:

The XMLSerializer and DataContractSerializer classes provides a simple means of serializing and deserializing object graphs to and from XML.

The key differences are:
1.
XMLSerializer has much smaller payload than DCS if you use [XmlAttribute] instead of [XmlElement]
DCS always store values as elements
2.
DCS is "opt-in" rather than "opt-out"
With DCS you explicitly mark what you want to serialize with [DataMember]
With DCS you can serialize any field or property, even if they are marked protected or private
With DCS you can use [IgnoreDataMember] to have the serializer ignore certain properties
With XMLSerializer public properties are serialized, and need setters to be deserialized
With XmlSerializer you can use [XmlIgnore] to have the serializer ignore public properties
3.
BE AWARE! DCS.ReadObject DOES NOT call constructors during deserialization
If you need to perform initialization, DCS supports the following callback hooks:
[OnDeserializing], [OnDeserialized], [OnSerializing], [OnSerialized]
(also useful for handling versioning issues)

If you want the ability to switch between the two serializers, you can use both sets of attributes simultaneously, as in:

[DataContract]
[XmlRoot]
    public class ProfilePerson : NotifyPropertyChanges
    {
[XmlAttribute]
[DataMember]
        public string FirstName { get { return m_FirstName; } set { SetProperty(ref m_FirstName, value); } }
        private string m_FirstName;
[XmlElement]
[DataMember]
        public PersonLocation Location { get { return m_Location; } set { SetProperty(ref m_Location, value); } }
        private PersonLocation m_Location = new PersonLocation(); // Should change over time
[XmlIgnore]
[IgnoreDataMember]
        public Profile ParentProfile { get { return m_ParentProfile; } set { SetProperty(ref m_ParentProfile, value); } }
        private Profile m_ParentProfile = null;

        public ProfilePerson()
        {
        }
    }

Also, check out my Serializer class that can switch between the two:

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

namespace ClassLibrary
{
    // Instantiate this class to serialize objects using either XmlSerializer or DataContractSerializer
    internal class Serializer
    {
        private readonly bool m_bDCS;

        internal Serializer(bool bDCS)
        {
            m_bDCS = bDCS;
        }

        internal TT Deserialize<TT>(string input)
        {
            MemoryStream stream = new MemoryStream(input.ToByteArray());
            if (m_bDCS)
            {
                DataContractSerializer dc = new DataContractSerializer(typeof(TT));
                return (TT)dc.ReadObject(stream);
            }
            else
            {
                XmlSerializer xs = new XmlSerializer(typeof(TT));
                return (TT)xs.Deserialize(stream);
            }
        }

        internal string Serialize<TT>(object obj)
        {
            MemoryStream stream = new MemoryStream();
            if (m_bDCS)
            {
                DataContractSerializer dc = new DataContractSerializer(typeof(TT));
                dc.WriteObject(stream, obj);
            }
            else
            {
                XmlSerializer xs = new XmlSerializer(typeof(TT));
                xs.Serialize(stream, obj);
            }

            // be aware that the Unicode Byte-Order Mark will be at the front of the string
            return stream.ToArray().ToUtfString();
        }

        internal string SerializeToString<TT>(object obj)
        {
            StringBuilder builder = new StringBuilder();
            XmlWriter xmlWriter = XmlWriter.Create(builder);
            if (m_bDCS)
            {
                DataContractSerializer dc = new DataContractSerializer(typeof(TT));
                dc.WriteObject(xmlWriter, obj);
            }
            else
            {
                XmlSerializer xs = new XmlSerializer(typeof(TT));
                xs.Serialize(xmlWriter, obj);
            }

            string xml = builder.ToString();
            xml = RegexHelper.ReplacePattern(xml, RegexHelper.WildcardToPattern("<?xml*>", WildcardSearch.Anywhere), string.Empty);
            xml = RegexHelper.ReplacePattern(xml, RegexHelper.WildcardToPattern(" xmlns:*\"*\"", WildcardSearch.Anywhere), string.Empty);
            xml = xml.Replace(Environment.NewLine + "  ", string.Empty);
            xml = xml.Replace(Environment.NewLine, string.Empty);
            return xml;
        }
    }
}


Jim McCurdy
Thanks Jim, great answer. Unfortunately i cannot use the XmlSerializer due to namespace issues and identical looking objects, hence my use of the DCS. And point #3 about the DCS was also good, i'm sure that has caught a few people out in the past :)
slugster
Jim, I know premature optimisation is the root of most evil, but `new DataContractSerializer(typeof(TT))` and `new XmlSerializer(typeof(TT))` are *very* expensive operations. You could cache these in a `static Dictionary<Type, XmlSerializer>` etc if you found that this code was getting slow.
Rob Fonseca-Ensor
Rob, Good observation about performance. I'll make your suggested changes in my apps. Thanks...
Jim McCurdy
+1  A: 

Another alternative is to zip the contents of the xml serialization. We also have a large serialization that has a rough compression ratio of 10-to-1. Of course the compression can take a fair bit of CPU to do its magic. We spawn of the compression in a thread to make sure the user interface doesn't slow down. We are using a modified SharpZipLib that works under Silverlight.

Joel Lucsy
I implemented zipping on the serialized stream today, and it certainly made a noticeable difference - a file that was previously 10.1 MB was reduced to 260KB, and it took approx 3 seconds to serialize, zip and save the data. When dealing with small amounts of data there was no perceptible increase in time to do the extra step of zipping the data, so the transition away from the page was taking well under a second to occur. Overall, i would have to say that zipping all data was a good option.
slugster
+1  A: 

Another option is to serialize to json. I do not know about performance, but I just compared the output when serializing a fairly complex list of entities to json vs. xml, and json is much more compact. Using json the resulting string was 1301303 bytes. With xml, 2429630. So it's almost half the size using json.

Below is the helper class I use when serializing/deserializing to json.

EDIT
I did some performance testing, and it actually turns out that json is faster as well. With xml, serializing 10000 objects took 636 milliseconds, with json only 257. Does anybody know if there are reasons not to choose json over xml?

EDIT
Tested again, with real data this time:
(1000 objects)
Uncompressed json: 605 kb
Uncompressed xml: 3,53 MB (!)
Zipped json: 28,5 kb
Zipped xml: 69,9 kb
Performance when using pre-initialized serializer:
json: ~350 ms
xml: ~120 ms

using System;
using System.Net;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Documents;
using System.Windows.Ink;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Animation;
using System.Windows.Shapes;
using System.IO;
using System.Text;
using System.Runtime.Serialization.Json;

namespace GLS.Gui.Helper
{
    public static class SerializationHelper
    {
        public static string SerializeToJsonString(object objectToSerialize)
        {
            using (MemoryStream ms = new MemoryStream())
            {
                DataContractJsonSerializer serializer = new DataContractJsonSerializer(objectToSerialize.GetType());
                serializer.WriteObject(ms, objectToSerialize);
                ms.Position = 0;

                using (StreamReader reader = new StreamReader(ms))
                {
                    return reader.ReadToEnd();
                }
            }
        }
        public static T Deserialize<T>(string jsonString)
        {
            using (MemoryStream ms = new MemoryStream(Encoding.Unicode.GetBytes(jsonString)))
            {
                DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(T));

                return (T)serializer.ReadObject(ms);
            }
        }

    }
}
Henrik Söderlund
Out of curiosity, have you compared the file sizes of *zipped* xml and json?
Rob Fonseca-Ensor
I did, and the results were pretty much the same. But that was with test data only. My data entities were named "entity 1", "entity 2" etc. up to 10000. With that much repeated data I got a compression rate of about 99%. I think the zipped json file was about 13kb and the xml was around 28kb. But to get useful results I would have to start playing around with real data, or at least randomizing names, child collections etc. And I am just a little too lazy to do that. :)
Henrik Söderlund
Turns out I wasn't too lazy after all. See the test results in the edited answer.
Henrik Söderlund
Ok, I finally found a reason to use xml over json. When caching the instances of the serializers, like Rob Fonseca-Ensor suggests in his comment above, xml is a lot faster than json. More than twice as fast actually.
Henrik Söderlund
+1  A: 

I have a compact binary serializer class for Silverlight and .NET that creates reasonably compact representations of an object graph - I had to build it for the same reason (and the cost of sending stuff over the wire to my WCF service).

You can find the code and a further description on my blog

Mike Talbot