He is both right and wrong at the same time.
With things like BinaryFormatter
, this is a non-issue; the serialized stream contains full type metadata, so if you have:
[Serializable] abstract class SomeBase {}
[Serializable] class SomeConcrete : SomeBase {}
...
SomeBase obj = new SomeConcrete();
and serialize obj
, then it includes "I'm a SomeConcrete
" in the stream. This makes life simple, but is verbose, especially when repeated. It is also brittle, as it demands the same implementation when deserializing; bad for either different client/server implementations, or for long-term storage.
With XmlSerializer
(which I guess the blog is talking about), there is no metadata - but the element names (or the xsi:type
attributes) are used to help identify which is used. For this to work, the serializer needs to know in advance what names map to which types.
The simplest way to do this is to decorate the base-class with the subclasses we know about. The serializer can then inspect each of these (and any additional xml-specific attributes) to figure out that when it sees a <someConcreteType>
element, that maps to a SomeConcrete
instance (note that the names don't need to match, so it can't just look for it by name).
[XmlInclude(typeof(SomeConcrete))]
public abstract class SomeBase {}
public class SomeConcrete : SomeBase {}
...
SomeBase obj = new SomeConcrete();
XmlSerializer ser = new XmlSerializer(typeof(SomeBase));
ser.Serialize(Console.Out, obj);
However, if he is a purist (or the data isn't available), then there is an alternative; you can specify all this data separately via the overloaded constructor to XmlSerializer
. For example, you might lookup the set of known subtypes from configuration (or maybe an IoC container), and setup the constructor manually. This isn't very tricky, but it is tricky enough that it isn't worth it unless you actually need it.
public abstract class SomeBase { } // no [XmlInclude]
public class SomeConcrete : SomeBase { }
...
SomeBase obj = new SomeConcrete();
Type[] extras = {typeof(SomeConcrete)}; // from config
XmlSerializer ser = new XmlSerializer(typeof(SomeBase), extras);
ser.Serialize(Console.Out, obj);
Additionally, with XmlSerializer
if you go the custom ctor route, it is important to cache and re-use the XmlSerializer
instance; otherwise a new dynamic assembly is loaded per usage - very expensive (they can't be unloaded). If you use the simple constructor it caches and re-uses the model, so only a single model is used.
YAGNI dictates that we should choose the simplest option; using [XmlInclude]
removes the need for a complex constructor, and removes the need to worry about caching the serializer. The other option is there and is fully supported, though.
Re your follow-up questions:
By "factory pattern", he's talking about the case where your code doesn't know about SomeConcrete
, perhaps due to IoC/DI or similar frameworks; so you might have:
SomeBase obj = MyFactory.Create(typeof(SomeBase), someArgsMaybe);
Which figures out the appropriate SomeBase
concrete implementation, instantiates it and hands it back. Obviously, if our code doesn't know about the concrete types (because they are only specified in a config file), then we can't use XmlInclude
; but we can parse the config data and use the ctor approach (as above). In reality, most times XmlSerializer
is used with POCO/DTO entities, so this is an artificial concern.
And re interfaces; same thing, but more flexible (an interface doesn't demand a type hierarchy). But XmlSerializer
doesn't support this model. Frankly, tough; that isn't its job. Its job is to allow you to store and transport data. Not implementation. Any xml-schema generated entities won't have methods. Data is concrete, not abstract. As long as you think "DTO", the interface debate is a non-issue. People who are vexed by not being able to use interfaces on their boundary haven't embraced separation of concerns, i.e. they are trying to do:
Client runtime entities <---transport---> Server runtime entities
rather than the less restrictive
Client runtime entities <---> Client DTO <--- transport--->
Server DTO <---> Server runtime entities
Now, in many (most?) cases the DTO and entities can be the same; but if you are trying to do something that the transport doesn't like, introduce a DTO; don't fight the serializer. The same logic applies when people are struggling to write their object:
class Person {
public string AddressLine1 {get;set;}
public string AddressLine2 {get;set;}
}
as xml of the form:
<person>
<address line1="..." line2="..."/>
</person>
If you want this, intoduce a DTO that corresponds to the transport, and map between your entity and the DTO:
// (in a different namespace for the DTO stuff)
[XmlType("person"), XmlRoot("person")]
public class Person {
[XmlElement("address")]
public Address Address {get;set;}
}
public class Address {
[XmlAttribute("line1")] public string Line1 {get;set;}
[XmlAttribute("line2")] public string Line2 {get;set;}
}
This also applies to all those other niggles like:
- why do I need a parameterless constructor?
- why do I need a setter for my collection properties?
- why can't I use an immutable type?
- why must my type be public?
- how do I handle complex versioning?
- how do I handle different clients with different data layouts?
- why can't I use interfaces?
- etc, etc
You don't always have these problems; but if you do - introduce a DTO (or several) and your problems go away. Taking this back to the question about interfaces; the DTO types might not be interface-based, but your runtime/business types can be.