tags:

views:

878

answers:

6

Hi all,

I came across a CMS known as GetSimple. It uses XML for storing all its internal data. In a way it is using XML as a database. Now could anyone explain me the advantages & disadvantages of using XML as a database.

Thanks in Advance. Tanmoy

A: 

A short look in the networks found this article on XML.com

The lead in is : 'Following a recent XML-DEV discussion on how to choose the most appropriate database for your XML application, the XML-Deviant captures the indicators that will help bring you closer to a decision.'

The article talks about the distinction of "data" and "document".

lexu
+2  A: 

Some information, Quoted from this site:

If your application requires moving data between enterprises, XML is a good solution. XML lets you send data across the Internet and through firewalls by using the standard HTTP protocol. XML is also a good choice if your application needs to move data between hardware or software platforms (OSs). XML is not machine- or OS-specific. Finally, XML is a good choice if you simply want to ensure that your application or data source is robust even if the data schema changes. XML enables your application to be extensible because you access the XML-formatted data by using element and attribute names instead of offsets, which structured programming languages use. Note that using element and attribute names to access data in XML is similar to accessing fields by name within a SQL Server table. If you have one or more of these application requirements, then XML is a good solution for you.

Next, you need to determine the best place to generate or consume XML within your application, which is an important decision because using XML incurs processing overhead. This overhead manifests itself in different ways depending on whether you're consuming or producing XML. For XML consumers, you need—at minimum—a method to parse the XML. You'll likely also need an object model to access the parsed data. For XML producers, converting native data formats to XML incurs overhead. On the middle tier, the processing overhead is crucial. If your middle-tier program manipulates, performs computations on, or reformats the data and your database is inside the firewall, XML shouldn't be your first choice. In this case, requesting a normal result set from the database and using traditional data-access methodologies to perform application processing will be more efficient. After processing is complete, the middle-tier application can generate the XML output. Using traditional data-access methodologies avoids the overhead of generating XML in the database as well as the overhead of parsing the XML and building an object model on the middle tier. The only potential benefit from generating XML on the middle tier is that you can loosely couple your middle-tier application and your database, but the cost is significant.

Now, let's apply these usage guidelines to the scenario you describe in your question. You don't seem to have a requirement to move data between enterprises, across the Internet, or through firewalls. So, unless you're trying to make your applications more extensible, XML isn't a good choice for your scenario. Traditional data-access technologies will meet your needs. But to demonstrate the value of XML, let's assume that you need to make your application extensible. You can upgrade to SQL Server 2000 and use its integrated XML support. This is your best option because it provides the most flexibility. If you must access your data from SQL Server 7.0 or 6.5, then check out the SQL Server XML technology preview at http://msdn.microsoft.com/downloads/samples/internet/xml/sqlxml/default.asp. This preview provides functionality similar to the XML support in SQL Server 2000, but the preview works with SQL Server 7.0 and 6.5. (For information about the differences between SQL Server 2000's XML integration and Microsoft's XML technology preview, see Bob Beauchemin, "The XML Files," September 2000.)

Kyle Rozendo
Next time give a short synopsis and provide a link. Then sum up in your own words k thnx
Elizabeth Buckwalter
I bolded the main points, and also I only pasted the relevant information. The -1 was very, very unnecessary.
Kyle Rozendo
@Elizabeth there's nothing wrong with including the relevant parts of external resources, even if they are moderately long. It is properly attributed and clear.
Rex M
Hi Kyle, thanks for the detailed answer. It was really helpful. But do you know any article which has been written by Open Source Technologist.
Tanmoy
Hi Tanmoy, I am sorry, but I don't really understand why that would make a difference? The article was written for the merits of a database versus flat-file XML. Even though MSSQL is named as the alternative, it is still just a comparison between XML vs. DB.
Kyle Rozendo
+2  A: 

Using XML as your database will work fine as long as your datasets stay relatively small. Meaining, it can all fit in memory and stay there comfortably. Once your data grows to the point where it will not all fit in memory, you will probably start seeing serious performance degradation.

baldy
Hi baldy, it's true that performance degradation may happen but do you know how much data can be aggregated into XML database ?
Tanmoy
Honestly, you'll start seeing inefficient queries long before you fill your memory. Many XML or flatfile DBs will start slowing down when they're over 20 or 30 megs, depending on the structure of your data.
Paul McMillan
A: 

Actually XML documents are databases already, whatever you do with DOM, SAX, Pull or VTD-XML, you still will need to do after storing in a database... it is more or less a perspective change in my view

vtd-xml-author
Why is this answer Community Wiki?
John Saunders
OP's choice... perhaps wondering what the checkbox does?
Marc Gravell
why not community wiki? I don't really know what it is for in the first place
vtd-xml-author
A: 

I think it also depends on the complexity of your queries. If you're reasonably comfortable writing XPath queries, then even if you have to query data across a couple of "dimensions" you're still left with reasonably non-horrible XPath code.

However, if you're talking about a data model that would require joins across 3 or 4 tables in SQL, you're probably already close to the point at which XPath stops scaling so well. I can't really say how well this works out with other query languages like XQuery or XLinq - maybe the trade off is in a different place.

Dominic Cronin
A: 

Also, see http://www.joelonsoftware.com/articles/fog0000000319.html for an exlanation of why "you can't implement the SQL statement SELECT author FROM books fast when your data is stored in XML."

("fast" is the key word here)

anonymous