views:

78

answers:

5

Let me first describe the situation. We host many Alumni events over the course of each year and provide online registration forms for each event. There is a large chunk of data that is common for each event:

  1. An Event with dates, times, managers, internal billing info, etc.
  2. A Registration record with info about the payment and total amount charged per form submission
  3. Bio/Demographic and alumni data about the 1 or more attendees (name, address, degree, etc.)

We store all of the above data within columns in tables as you would expect.

The trouble comes with the 'extra' fields we are asked to put on the forms. Maybe it is a dinner and there is a Veggie or Carnivore option, perhaps there is lodging and there are bed or smoking options, or perhaps there is an optional transportation option. There are tons of weird little "can you add this to the form?" types of requests we receive.

Currently, we JSONify any non-standard data and store it all in one column (per attendee) called 'extras'. We can read this data out in code but it is not well suited to querying. Our internal staff would like to generate a quick report on Veggie dinners needed for instance.

Other than creating a separate table for each form that holds the specific 'extra' data items, are there any other approaches that could make my life (and reporting) easier? Anyone working in a simialr environment?

+2  A: 

Have you considered using XML instead of JSON? Difference: XML is supported (special data type) and has query integration ;)

TomTom
Actually yes - I was considering XML. Any good XML/SQL online articles out there for newbies?
indyDean
Nothing except the documentation that i know off, sorry.
TomTom
A: 

quick and dirty, but actually nice for querying: simply add new columns. it's not like the empty entries in the previous table should cost a lot.

more databasy solution: you'll have something like an event ID in your table. You can link this to an n:m table connecting events to additional fields. And then store the additional field data in a table with additional_field_id, record_id (from the original table) and the actual value. Probably creates ugly queries, but seems politically correct in terms of database design.

I understand "NoSQL" (not only sql ;) databases like couchdb let you store arbitrary fields per record, but since you're already with SQL Server, I guess that's not an option.

Nicolas78
Except that I'm guessing there are a variable number of "extra" fields that could be added. That is, sometimes there would just be one or two, but there could be many more. Scaling becomes a problem very quickly.
AllenG
well a quick search for "maximum number of columns in sql server" turns up the number 1024, and that's from 2005. that should be able to hold a *lot* of eventualities ;) But I agree it's not a nice solution. Although it's probably the one which makes those people happy who want to run a simple query like "what's the percentage of vegetarians over all events, where this was an option?"
Nicolas78
+3  A: 

Well you could also have the following db structure:

Have a table to store custom attributes

AttributeID
AttributeName

Have a mapping table between events and attributes with:

AttributeID
EventID
AttributeValue

This means you will be able to store custom information per event. And you will be able to reuse your attributes. You can include some metadata as

AttributeType
AllowBlankValue 

to the attribute to handle it easily afterwards

sTodorov
I was considering this but am not sure that I could put up a single row of data for each attendee. How would you do that for say first_name, last_name (from Attendee table) joined with two values stored as you suggest?
indyDean
+1 I was going to suggest this, but it sounded like he may not have wanted any new tables...but I guess a few more tables != hundreds :)
AGoodDisplayName
@indyDean: Well you can have an attribute first_name, with an attribute value for a specific Attendee in a mapping table. So:AttributeIDAttendeeIDAttributeValue
sTodorov
A: 

This is the solution that we first proposed in ASP.NET Forums (that later became Community Server), and that the ASP.NET team built a similar version of in the ASP.NET 2.0 Membership when they released it:

Property Bags on your domain objects

For example:

Event.Profile() or in your case, Event.Extras().

Basically, a property bag is a serialized collection of data stored in a name/value pair in a column (or columns). The ASP.NET 2.0 Membership went the route of storing names in a semi-colon delimited list, and values in the same:

Table: aspnet_Profile Column: PropertyNames (separated by semi-colons, and has start index and end index) Column: PropertyValues (separated by semi-colons, and only stores the string value)

The downside to that approach is it is all strings, and manually has to be parsed (even though the membership system does it for you automatically).

Recently, my current method is I've built FormCollection and NameValueCollection C# extension methods that automatically serialize the collections to an XML result. And I store that XML in the table in it's own column associated with that entity. I also have a deserializer C# extension on XElement that deserializes that data back to the collection at runtime.

This gives you the power of actually querying those properties in XML, via SQL (though, that can be slow though - always flatten out your read-only data).

The final note is runtime querying: The general rule we follow is, if you are going to query a property of an entity in normal application logic, then you move that property to an actual column on the table - and create the appropriate indexes. If that data will never be queried directly (for example, Linq-to-Sql or EF), then leave it in the XML Property Bag.

Property Bags gives you the power of extending your domain models however you like, without having to modify the db schema.

eduncan911
+3  A: 

This is actually one of the toughest problem to solve efficiently. The SQL Server Customer Advisory Team has dedicated a white-paper to the topic which I highly recommend you read: Best Practices for Semantic Data Modeling for Performance and Scalability.

You basically have 3 options:

Each solution comes with ups and downs. Out of the top of my hat I'd say XML is probably the one that gives you the best balance of power and flexibility, but the optimal solution really depends on lots of factors like data set sizes, frequency at which new attributes are created, the actual process (human operators) that create-populate-use these attributes etc, and not at least your team skill set (some might fare better with an EAV solution, some might fare better with an XML solution). If the attributes are created/managed under a central authority and adding new attributes is a reasonable rare event, then the sparse columns may be a better answer.

Remus Rusanu
I would say sparse columns are not feasible - the DB structure should be static, which rules out sparse columns. Semantics + XML are the only solution allowing extension without code changes.
TomTom
Adding new sparse columns on-the-fly is a a metadata only operation, they can be added by the application itself as new attributes are created. Sparse columns were designed to address exactly this kind of scenario. Usually the application *must* use some sort of dynamic SQL to handle the new columns. Whether they are truly the optimal solution though, that is a different story. I do agree though that XML and EAV are *usually* better suited and easier to implement.
Remus Rusanu
great white paper and links - thanks for the nudge :)
indyDean
I fixed the link for sparse columns as it was pointing haywire to JDBC driver info instead of the engine info
Remus Rusanu