tags:

views:

442

answers:

7

I have a system which generates a large number of XML documents every day (of the order of 1 million) and I would like to be able to store and index these so that I can, for example, search for all documents with a certain field set to a given value.

I understand that there are fundamentally two types of XML database, those that provide XML support on top of a conventional relational database and those that are "native" XML database. Given that I am open to using either, what would you recommend?

+1  A: 

I cannot give you a good candidate, but if you want to avoid picking a bad one, avoid Oracle XmlDB. It's slow and buggy as hell. One of the worst extension Oracle made to its DBMS.

gizmo
Thank you, I was just looking into it.
Matt Luongo
+1  A: 

According to the people who worked on it monetDB is quite able to handle xml. Basically, when you want to do xquery on the contents, you want to have something that can do that properly. The structure of an XML document which is very flexible and of undetermined length is fundamentally different than that of an rdbms. This means that things need to be stored and indexed smartly. For traditional dbms systems it is an easy shortcut to store the xml as a BLOB (binary object). But BLOB's aren't generally indexed, and live mainly outside the primary data store. To parse xml properly the system must go into the xml when storing. MonetDB apparently does this.

Paul de Vrieze
+6  A: 

Microsoft SQL Server has support for XML columns. This is more than just BLOB/TEXT support.

You can use XML columns in an unstructured manner, where SQL Server will just ensure they are correct XML. This allows storage of arbitrary XML documents inside SQL Server, but still ensuring you're dealing with XML and not just arbitrary bytes/characters. SQL Server lets you query on top of this using XQuery.

You can also create XML columns that conform to a schema using XSD. More interestingly, SQL Server allows indexing the XML so that your XPath queries can perform well.

See "What's New for XML in SQL Server 2008" for more information. (Although most of the XML support exists in SQL Server 2005.)

MichaelGG
A: 

Definitely try MS-SQL, Oracle, and other existing systems that support XQuery.

But, if the XML-based queries you'll need are known in advance, it might be easier to just store the XML in a BLOB, and add an indexed field or two with a copy of the relevant XML element(s).

Javier
+2  A: 

DB2 9.x with PureXML support is a possibility.

Jonathan Leffler
A: 

You might also wish to checkout MarkLogic Server or eXist.

If you have a moderate amount of content(a couple of GB) then eXist will do. Beyond that you'll probably want to look into MarkLogic.

You can download and check out both of them, since eXist is free and MarkLogic has a community license for you to play around with.

A: 

If you are looking for a native XML db, I definitely suggest Sedna. Brilliant developer support as well.

JJK