I have an entity that, in addition to a few common properties, contains a list of extended properties stored as (Name, Value) pairs of strings within a collection. I should probably mention that these extended properties widely vary from instance to instance, and that they only need to be listed for each instance (there won't be any queries over the extended properties, for example finding all instances with a particular (Name, Value) pair). I'm exploring how I might persist this entity using Windows Azure Table Services. With the particular approach I'm testing now, I'm concerned that there may be a degradation of performance over time as more distinct extended property names are encountered by the application.
If I were storing this entity in a typical relational database, I'd probably have two tables to support this schema: the first would contain the entity identifier and its common properties, and the second would reference the entity identifier and use EAV style row-modeling to store the extended (Name, Value) pairs, one to each row.
Since tables in Windows Azure already use an EAV model, I'm considering custom serialization of my entity so that the extended properties are stored as though they were declared at compile time for the entity. I can use the Reading- and Writing-Entity events provided by DataServiceContext to accomplish this.
private void OnReadingEntity(object sender, ReadingWritingEntityEventArgs e)
{
MyEntity Entry = e.Entity as MyEntity;
if (Entry != null)
{
XElement Properties = e.Data
.Element(Atom + "content")
.Element(Meta + "properties");
//select metadata from the extended properties
Entry.ExtendedProperties = (from p in Properties.Elements()
where p.Name.Namespace == Data && !IsReservedPropertyName(p.Name.LocalName) && !string.IsNullOrEmpty(p.Value)
select new Property(p.Name.LocalName, p.Value)).ToArray();
}
}
private void OnWritingEntity(object sender, ReadingWritingEntityEventArgs e)
{
MyEntity Entry = e.Entity as MyEntity;
if (Entry != null)
{
XElement Properties = e.Data
.Element(Atom + "content")
.Element(Meta + "properties");
//add extended properties from the metadata
foreach (Property p in (from p in Entry.ExtendedProperties
where !IsReservedPropertyName(p.Name) && !string.IsNullOrEmpty(p.Value)
select p))
{
Properties.Add(new XElement(Data + p.Name, p.Value));
}
}
}
This works, and since I can define requirements for extended property names and values, I can ensure that they conform to all the standard requirements for entity properties within a Windows Azure Table.
So what happens over time as the application encounters thousands of different extended property names?
Here's what I've observed within the development storage environment:
The table container schema grows with each new name. I'm not sure exactly how this schema is used (probably for the next point), but obviously this xml document could grow quite large over time.
Whenever an instance is read, the xml passed to OnReadingEntity contains elements for every property name ever stored for any other instance (not just the ones stored for the particular instance being read). This means that retrieval of an entity will become slower over time.
Should I expect these behaviors in the production storage environment? I can see how these behaviors would be acceptable for most tables, as the schema would be mostly static over time. Perhaps Windows Azure Tables were not designed to be used like this? If so, I will certainly need to change my approach. I'm also open to suggestions on alternate approaches.