tags:

views:

666

answers:

5

I am wondering if there are best practices for deciding between when a system should be modeled using XML and when it should be modeled using a relational database (I know you can store XML in a database, but there is an enormous difference between modelling a system using normalized db tables and modelling a system using XML-Schema). For concreteness sake, let's say you were modeling exercises in a gym. The "bench press" is actually a family of exericses, not a single exercise. You can lie down on a bench, or a ball. You can force you're back flat or allow cheating. You can use dumbells, barbells, cables or a universal machine. If you are using dumbells you can alternate arms or push simultaneously. You can have an inclined, declined or flat surface. My thinking is that due to the complexity (and possible complexity that I have yet to think of) that this would best be modelled using xml. Is this a good assessment? What other important factors should be considered?

Addendum: When I said XML, one of the technologies I had in the back of my mind was RDF (though I did not wish to limit the discussion to that), which would seem to have pros and cons compared to implementing the design in database tables. I'm not sure if the general antipathy some users feel towards XML would extend all the way to RDF (maybe so) but perhaps that will help focus the conversation a little.

+2  A: 

In general, XML is just a temporary file format to send data from one system to another. Or to store a small set of data, like configuration options and a bit more data. If your data needs are small and you're dealing with a single-user situation, XML will be just fine.

If you have to deal with a multi-user environment, you can still use XML but you would need to create a complex business layer around it, keeping tracks of modifications by all users and basically adding lots of multi-user functionality that a normal RDBMS offers as standard. If you have lots of data, there is a risk that your XML file becomes too big. The XML standard is a bit bloated and if you have to work with XML files of 500 MB each, I hope you have lots and lots of patience.

There are, of course, other alternatives. I created a simple web crawler once which would download a webpage, extract all URLs in it and then repeat this action for every URL. It used about 20 threads that were all downloading pages and the number of URLs would grow into millions. I wanted to avoid downloading a single URL twice so I had to filter out duplicates. Using XML would a nightmare, considering the amount of data. Using a database was overkill since all I needed was a single table with a single field: URL. So I wrote a special hashing algorithm and created my own file-based hashing table solution. It was really fast too, checking several thousands of URL's per second, if it didn't have to download that pages...

With situations like this exercise, I would start by creating a simple XML schema by using some modeling tool for XML. (Altova's XMLSpy is good at this.) When I think my data would fit nicely in this XSD, I start to create a database, where every element will be converted to a table. As a result, I would have a good, relational database plus some definition for an XML format that can be used to import/export the same data to/from the database.

Workshop Alex
Basically, what I'm saying is start modelling in XML. When you're happy with it, convert it to a database.
Workshop Alex
+1  A: 

How about "none of the above"?

I would first model the Domain using a conceptual modeling tool like NORMA. This would permit you to concentrate on the model, until you're finished. At that point, NORMA can generate DDL for several popular databases, as well as an XML schema.

John Saunders
+5  A: 

In the 1960's, data management systems were invented/conceived/elaborated which were all based on the idea that data can be organized hierarchically. IMS being one of them. The fallacies/deficiencies/shortcomings of those systems immediately became clear to anyone intensively using them (for one, they tend to lead to 'query biases' : in hierarchical systems, it is often easy to query which contracts exist for a given customer, and at the same time almost impossible to query which customers are involved in a given contract).

All those deficiencies have ultimately led to the invention of the relational model.

So if you want to know whether XML is suitable as a solution for ANY DATA MANAGEMENT PROBLEM WHATSOEVER, ask yourself this : "is XML hierarchical in nature or not ?".

The success of XML in the marketplace only proves the correctness of the observation that "those who don't know history are doomed to repeat it".

+1 thanks for the nice historical view
KLE
A: 

Your exercising example is a good one, but I think you reach the wrong conclusion.

My thinking is that due to the complexity (and possible complexity that I have yet to think of) that this would best be modelled using xml.

I think this conclusion is based on the faulty assumption that XML provides greater modeling flexibility than the relational model. In fact (as Erwin Smout skillfully describes), the relational model is inherently more flexible than XML, because XML is strictly hierarchical whereas the relational model allows for many-to-many relationships of arbitrary complexity.

XML could potentially be more flexible at run time if you don't require a strict schema and want to be able to store just about anything. But then we're not really talking about modeling anymore.

John M Gant
+1  A: 

Your exercise example could be modeled in many ways. For some experience and wisdom on the question of when xml's hierarchical model shows an advantage, read Ron Burrett:

http://www.rpbourret.com/xml/XMLAndDatabases.htm

There are cases where native xml DBs show enormous advantages of RDBs when the content to be stored is semi-structured. @Smout, es it is easier and safer to store the customer-contract-customer data in an RDB -- but what happens when you also have to store the contract?

RDF contrasts with both relational models and with xml models. RDF is designed for an "open-world" representation of data in which you can never be sure that you know everything at the time you have to compute. The fact that RDF can be expressed in xml is convenient, but incidental. It has other expressions as well.

Do some reading at EMC XML Technologies and MarkLogic as well.