tags:

views:

63

answers:

1

I'm working on a database driven web-application (ASP.NET, SQL 2008), which receives structured XML data from various sources. The data resembles a set, and often needs 'cleanup', so it is passed through the database as XML, and turned into a resultset for display.

I'd like to capture the produced 'clean' results, and send them to an archive database to persist them to disk.

The options I've considered so far are:

  • Serialize the entire 'clean' result set into an object (XML/.NET serialized), and send this back to the archive database

    • PRO: Easily repeatable - can profile/capture the database calls on the archive machine, and re-run them to identify any problems
    • CON: Versioning could be tricky
  • Store the cleaned results in a table, and periodically copy fresh records in this table to the archive machine

    • PRO: Easy build - quick scheduled job
    • CON: Harder to repro calls on the archive machine; would need to keep input table contents around

Are there any other options, and has anyone had any experience with similar situations?

+1  A: 

I have used both cases succesfuly and what I do depends on the system.

Saving Raw Xml:

I tend to save Raw Xml when I am either dealing with unstructured data, or when we are dealing with a messaging system, and we want to track the messages. For example, an application I worked on collected messages from deployed windows clients, we would dump the messages into a relational structure and then roll them into a warehouse. When I took over the project, we started storing the raw xml that was coming because it did allow us the replayability, and the ability to see exactly what is coming into the system.

Relational Data

If I am going to need to do any reporting aggregation of the data, I would break the data out and store it into regular tables. I know you can query xml data in a database, but I try and avoid that. I might still save the original raw messages for replayability and troubleshooting.

Saving a Binary Object

The last thing I have done is save an entire serialized binary object. I find this handy when the object graph is quite complex, and the relationships between the objects are important. It does have a huge downside which is versioning; however, I have managed this quite sucesfully versioning even with namespace changes, object heirarchy changes etc. If you need to access the data in SQL this is not the way to go.

JoshBerke