tags:

views:

317

answers:

5

My company is in education industry and we use XML to store course content. We also store some course related information (mostly metainfo) in relational database. Right now we are in the process of switching from our proprietary XML Schema to DocBook 5. Along with the switch we want to move course related information from database to XML files. The reason for this is to have all course data in one place and to put it under Subversion. However, we would like to keep flexibility of the relational database and be able to easily extract specific information about a course from an XML document. XQuery seems to be up to the task so I was researching databases that supports it but so far could not find what I needed. What I basically want, is to have my XML files in a certain directory structure and then on top of this I would like to have a system that would index my files and let me select anything out of any file using XQuery. This way I can have "my cake and eat it too": I will have XQuery interface and still keep my files in plain text and versioned. Is there anything out there at least remotely resembling to what I want?

If you think that what I an asking for is nonsense please make an alternative suggestion.

On the related note: What XML Databases (preferably native and open source) do you have experience with and what would you recommend?

+1  A: 

Take a look at exist, it is an open source xml database that supports XQuery.

preston
eXist doe not store data in XML plain text, but in persistent DOM.
xsaero00
+1  A: 

For am Native XML database you can try Berkeley XMLDB, which is maintained by Oracle, but is open source.

If you would like a real robust solution, you can use a MarkLogic Xml Server. There is a cost.

+1  A: 

I don't know of any XQuery implementation that will both index your documents and leave them on the filesystem.

But if you have a small amount of data, you could use the filesystem and use Saxon as your XQuery implementation to query the documents. Saxon can treat any directory as a "collection" (in a pretty flexible way), which means you can query across a bunch of documents at the same time.

If you have a moderate amount of data (and the filesystem approach is too slow), then eXist is a good open-source option that I've used. One advantage is that it has a WebDAV interface which means it's very easy to edit the files and view them as just another directory. eXist has a history trigger which will store old versions of documents as they're replaced; I haven't used it but you might be able to build something around that which would give you the version control you need. It's also possible to backup the eXist database to a file, which you'd then version control using Subversion.

If you have a large amount of data or eXist isn't robust enough, then MarkLogic Server is the leading commercial XML database and I believe it has some support for versioning internally.

JeniT
A: 

I have worked with Berkeley XMLDB a lot the past year and its kinda a mixed bag.

Pros: FAST, xquery and xupdate, oracle is maintaining well, many languages have interfaces, small imprint, embedded, file based (maybe some see that as a con?), extremely flexible for some wicked awesome queries

Cons: its a bug pain in the butt if you are dealing with any concurrency type of situation, environments are a weird concept for any relational db person to pick up, very sensitive in general and tends to segfault if not happy

Agree with another poster - going to a more robust situation is a big cost, usually in speed. If I was going to try anything else, it would be exist but I'm deterred by the overhead of the java packaging.

Conceptually xmldbs rock super hard, its just that the implementations of it are somewhat immature, lack of competition, lack of industry know how.

A: 

MarkLogic Xml Database server (4.x) has couple of good features that you try.

  1. it has a good native Xquery implemenation which you can query your xml documents.

  2. it has an inbuild search engine /search parser and has a XQuery extension which can index your documents fast.

  3. it has a simple REST based protocal support which can talk to external system and behave.

kadalamittai