In which cases storing data as XML is preferable to RDBMS and why?
Can you give any analogy?
In which cases storing data as XML is preferable to RDBMS and why?
Can you give any analogy?
If you have to move them around in a compatible, human accessible format, or if the conceptual model of your data does not follow easily the relational model.
Also, if your application wants to store data in an easy to parse, but still extensible format. Take for example a browser. I would store the bookmarks data in an XML file, not in a relational table.
If the data can naturally be described in a tree like structure, XML could be OK. I'd prefer a lighter alternative, though. YAML and JSON are candidates.
I would store XML in a database if I already get it as XML (like from a web service call or something) and need to keep the "original" data copy somewhere.
I might also store something in XML that's highly hierarchical and/or only semi-structured, something that's just awkward and tricky to express in rows/columns that a normal RDBMS table excels at.
Typically, whenever you need to process the information in your database or apps using your database, it's easier to do if it's in relational tables. So unless you really have good reason to use XML, don't just use it because you're too lazy to create a few tables.
XML has its merits and all - but it's in general rather verbose, a bit cumbersome to handle at times (it's much easier to SELECT on a column in your table than reaching into the XML to grab out a value), and overall typically slower than using relational tables straight away.
SELECT fieldName
FROM table
is easier to use and read and comprehend than
SELECT
xmlData.value('(xpath-expression)[1]', 'int') as 'Field'
FROM table
So, to sum up: use it if you really see a need and benefit, but don't overdo it (just because you can or because it's cool or sexy). Use with caution and with good reasons.
Marc
I would never favour storing many XML
files worth of data over a database anytime.
XML is good for readable and extendable file formats - e.g. when you save in an application. XML is preferable as it allows anyone else to use that file format.
Another time XML
is preferable is for configuration settings. For example, I do a lot of game programming and it would be madness for myself to store gameplay settings in a database. An XML
(or any file for that matter) is much easier.
However given records (data in records) such as products, or customers you would be very wrong to favour something other than a database to store this data. Back up, speed and scalability being three examples why.
So the answer is - it depends.
You have to be the judge and make the right call.
As for an analogy:
"XML is like tupperware. If you went shopping and got home, yes you could put each item in a custom made tupperware box, but at the end of the day its probably easier just to put stuff in the fridge."
XML has its place though, as I menetioned above.
Summary
If you don't have much data and you're in total control of it (no dependent 3rd parties), XML is a nice option. Otherwise, RDBMS - see below for more reasons.
Analogy
If RDBMS is a filing cabinet (drawers of same-sized records organized by some index) then XML is a back-pack (not-necessarily-organized bag of randomly-sized records, may stick out at the corners).
Reasons for XML
1) Flexibility
If your schema is either very loose or changes over time, XML is preferable as versioning RDMS is hard once there's data inside it. In my experience, XML Serialization, XSLT and XPath queries are resilient to changes in the XML schema and can continue to work for old/new clients. For example, you can add some new elements into a document and an older EXE that reads that document will just ignore those elements. An RDBMS query that does 'SELECT * FROM table' where you just added a column will have undefined results.
2) Deployment
Easy - just ship your EXE.
3) Debugability
Easy to 'debug' the data - the XML could be human-readable already; if not, XSLT may make it more readable.
4) Interoperability
You can hand the XML off to other systems and not care what platform/technology they use.
Reasons for RDBMS
1) Performance
If you have a lot of data, then the indexing features of a RDBMS will give you best performance. Reading a large XML (> 1000 records) is expensive if you're fundamentally just trying to find the record with ID=123, which RDBMS can do in a snap. Stored Procedures would make this even better.
2) Security
You can secure parts of RDBMS through permissions - e.g. grant/deny SELECT access to various users.
3) Business Tools
There are many RDBMS tools for things such as OLAP and reporting.
I use XML by hard. In Addition with http://commons.apache.org/digester/ it is an powerful source. Just my 2 Cents.
I would be very careful because XML is not a data storage device. XML is a meta-language, a language for creating languages. XML is frequently misused as a data storage device, but that frequency does not provide credence to a faulty understanding of the technology.
I have heard somebody attempt to argue this claiming that meta-language means a language of data about data, which is still not a data storage device. A language created to represent a description upon the structure of data is not a language of the data itself, and while an instance of the language may be a structure created of data the language itself is not.
If your intent is not to create a language or to represent a language for a class of data I would not recommend using XML as the result will be bloated, slow, and you will not likely use any of the assisting technologies that make XML truly powerful. As alternatives others have mentioned YAML and JSON, which are considerably lighter. I would recommend trying one of those. If the meta data, and its structure/relationships, is more important to you than the data itself then I would stick with XML because because the potential for processing and manipulation is incredibly extensible.
Mostly XML is preferred over database for writing application preferences/settings. I assume it because, 1. It is easier to corrupt database files 2. XML support cross platform portability.
I assume you mean "XML in a sequential text file". Otherwise it's not really an either/or question: you could store XML in a relational database, you could export a relational database to XML, etc. That said ...
XML is great for complex data streams with unpredictable data. Like a text file: at any point it could be meaningful to start a new chapter, include a footnote, switch to italics etc. You would not normally expect every chapter to have the same number of footnotes, or even that every text document would include footnotes. You would not expect every document to have six words of plain text followed by three words in italics followed by a footnote, etc. XML allows tags to occur in a very flexible way.
Relational databases are great for data in consistent formats. For, say, a customer order, you would expect to have a customer name, address, item ordered, price, etc. An order with no customer specified is almost surely not processable.
Many people today are using XML for all data storage and transfer. I think this is a big mistake. XML is not only very cumbersome for data that comes in predictable formats -- the so-called "bracket tax" -- but it also creates all sorts of opportunities for error. A fixed format like CSV provides no way to even say that you want two customer names on the same order. There's only one place to put it, there's no way to put it twice. But in XML you could include two "customer" tags or attributes. CSV gives no way to specify undefined attributes. There's no way that the customer name is in italics or the price is in kilograms. But in XML there could be any arbitrary set of attributes. Thus a program trying to process an XML stream for fixed data has to deal with all sorts of possible errors that would not even come up in other formats.
Lots of good responses here, but they all miss the most important point. The big problem a relational database solves for you is synchronizing multiuser access. For a single user program, you can read everything into memory from a set of XML files at startup and write it all out again on save - if you can find a market for single-user software in the first place. For the general case of multiuser access, that solution won't work, and if you start messing around with fine-grained locking, you're basically setting out to redo about 30 years of work that is already done for you if you use a relational database.