views:

121

answers:

4

I had been assigned to develop a system on where we would get a XML from multiple sources (millions of xml) and put them in some database like and judging from the xml i would receive, there wont be any concrete structure even if they are from the same source. With this reason i think i cannot suggest RDMS and currently looking at NoSQL databases. We need a system that could do CRUD and is fast on Read.

I had been looking at MarkLogic and eXist, which are both XML based NoSQL databases, have anyone had experience with them? and any other suggestion? Thanks

+1  A: 

Even if the XML doesn't have a particular structure, as long as it's validating XML you could still store it in a traditional SQL database by essentially writing out the DOM. You would have tables for elements and attributes. The elements and attributes would have a foreign key column to a parent element and a column for name.

You say you need to have fast reads. What exactly are you reading? If you'll be looking for specific tags, then a traditional SQL database would still be able to query that pretty quickly.

Reinderien
There would a xml file of a person with a lot of his/her details, these xml could go up from 12KB - 50KB per xml, i need to search something in the xml itself. I was thinking that putting them into a XML field and doing "select * from table where detail like '%<person information>%' is slow especially if it reach millions of record(this would really be the case after 2-3 months)? Am i right on this one? Thanks
monmonja
Is there absolutely no consistent structure to the XML? Even if there are one or two tags that are the same between every file, that would help to separate data between columns and make queries faster. What are these "multiple sources"?
Reinderien
I think the one or two tags would be separated would be a good idea thanks. On the multiple sources, there is a person XML from multiple companies we fetch data from. These XML might differ from company to company further more on one company it might be different too (from country branch to country branch). Anyhow is what sticking to traditional database is what your suggesting? Thanks a lot
monmonja
I would say, stick to a traditionally structured database, and have glue code that parses each company's XML into your own format. It might seem like a lot of work initially, but you'll be thankful in the long run if you make your own data essentially a haven for format sanity and mash everyone else's data into what you have.
Reinderien
A: 

I don't have any practical experience with it, but I have read that IBM DB2 has special XML capabilities.

SQL Server has an xml field type, but imposes some restrictions when you have such fields in a table. An annoying one (for me), is that you cannot use such a table on a linked server.

iDevlop
+1  A: 

I am just looking for something similar. And found that there are special xml databases doing just that.

Look here: Wikipedia

I found that this one is pretty good: Sedna

schoetbi
Thanks would look at this too
monmonja
A: 

Take a look at this project: http://exist.sourceforge.net/

JLBarros