views:

93

answers:

3

Imaging the following 3 relationships in a data model

Entity > Path > Link

Both relationships are 1 to many. So Entity has multiple paths and a path has multiple links.

Should I do this as 3 tables with relationships between the tables

Or create a table that stores that path information as XML.

This table (lets call it Paths), will store 1 path/row. So we end up with 2 tables instead of 3: Entity > Path

The XML looks something like this:

<path>
   <link entitySource=1 entityTarget=2>
   <link entitySource=2 entityTarget=6>
   <link entitySource=6 entityTarget=9>
</path>

What are the benefits of each design? I want to use the 3-table design and need a good explanation to convince the CTO why I should. He is convinced that the XML route is a better design because it will reduce database joins and hence improve read performance.

Read performance is important because this table will be used to store millions of records and needs to be searched quickly.

A: 

This depends on whether your domain model has Link as a full-class entity, or whether the collection of links for a path is just an attribute of a path.

The first case is true of you will ever need to search the data for individual links, to see what path they belong to, or for any other purpose.

The second might be true if you will never ever need to access or process individual links or any individual link, but will only ever access all links on a path as a complete collection.

Charles Bretana
+5  A: 

Tables

Some thoughts:

  • Parsing XML will be expensive
  • Assuming 100 million "rows", that's a 6GB ish string of unicode xml
  • Databases are designed to JOIN
  • Indexing?
  • Is this not a single self-referencing table that can be traversed with a CTE (SQL Server)
  • Finally, why is the CTO working at this level?
gbn
And editing an xml field when there is a change always causes problems here when we have used xml to store information that might change.Agree with you about the CTO, he should not be at this level of detail.
HLGEM
+2  A: 

What database engine are we talking about?

If is any of the commercial relational engines (SQL Server, Oracle, DB2, MySQL) then the best performance will come from using... relations. Tables, in other words. Indexes, foreign key constraints, this kind of stuff. While most have some XML support, it will not match relational performance. I can understand a discussion about shredding XML into tables vs. keeping it in XML blobs. But to model many-to-many relations and foreign keys as XML is as bad idea as I've seen one in quite few days (and I'm seeing bad ideas on a daily basis...).

On the other hand if you are talking about some esoteric XML oriented database, let us know :)

Remus Rusanu