views:

52

answers:

1

I have an entity that, in addition to a few common properties, contains a list of extended properties stored as (Name, Value) pairs of strings within a collection. I should probably mention that these extended properties widely vary from instance to instance, and that they only need to be listed for each instance (there won't be any queries over the extended properties, for example finding all instances with a particular (Name, Value) pair). I'm exploring how I might persist this entity using Windows Azure Table Services. With the particular approach I'm testing now, I'm concerned that there may be a degradation of performance over time as more distinct extended property names are encountered by the application.

If I were storing this entity in a typical relational database, I'd probably have two tables to support this schema: the first would contain the entity identifier and its common properties, and the second would reference the entity identifier and use EAV style row-modeling to store the extended (Name, Value) pairs, one to each row.

Since tables in Windows Azure already use an EAV model, I'm considering custom serialization of my entity so that the extended properties are stored as though they were declared at compile time for the entity. I can use the Reading- and Writing-Entity events provided by DataServiceContext to accomplish this.

private void OnReadingEntity(object sender, ReadingWritingEntityEventArgs e)
{
    MyEntity Entry = e.Entity as MyEntity;

    if (Entry != null)
    {
        XElement Properties = e.Data
            .Element(Atom + "content")
            .Element(Meta + "properties");

        //select metadata from the extended properties
        Entry.ExtendedProperties = (from p in Properties.Elements()
                          where p.Name.Namespace == Data && !IsReservedPropertyName(p.Name.LocalName) && !string.IsNullOrEmpty(p.Value)
                          select new Property(p.Name.LocalName, p.Value)).ToArray();
    }
}

private void OnWritingEntity(object sender, ReadingWritingEntityEventArgs e)
{
    MyEntity Entry = e.Entity as MyEntity;

    if (Entry != null)
    {
        XElement Properties = e.Data
            .Element(Atom + "content")
            .Element(Meta + "properties");

        //add extended properties from the metadata
        foreach (Property p in (from p in Entry.ExtendedProperties 
                                where !IsReservedPropertyName(p.Name) && !string.IsNullOrEmpty(p.Value)
                                select p))
        {
            Properties.Add(new XElement(Data + p.Name, p.Value));
        }
    }
}

This works, and since I can define requirements for extended property names and values, I can ensure that they conform to all the standard requirements for entity properties within a Windows Azure Table.

So what happens over time as the application encounters thousands of different extended property names?

Here's what I've observed within the development storage environment:

  • The table container schema grows with each new name. I'm not sure exactly how this schema is used (probably for the next point), but obviously this xml document could grow quite large over time.

  • Whenever an instance is read, the xml passed to OnReadingEntity contains elements for every property name ever stored for any other instance (not just the ones stored for the particular instance being read). This means that retrieval of an entity will become slower over time.

Should I expect these behaviors in the production storage environment? I can see how these behaviors would be acceptable for most tables, as the schema would be mostly static over time. Perhaps Windows Azure Tables were not designed to be used like this? If so, I will certainly need to change my approach. I'm also open to suggestions on alternate approaches.

+3  A: 

Development storage uses SQL Express to simulate cloud table storage. Ignore what you see there... the production storage system doesn't store any schema, so there's no overhead to having lots of unique properties in a table.

smarx
To add on to that, you shouldn't expect in the production storage system to see XML coming back for the properties that don't exist on an entity. I think what you're doing is exactly the right way to handle your scenario.
smarx
Thanks! I figured this was probably the case but could not find any documentation that explicitly indicated either way.
Michael Petito