views:

22

answers:

3

This question is theoretical as well as practical. Any results indicating useful resources on optimizing queries will be appreciated.

There is a large SQL database which stores a large amount of data stored in SQLXML fields. Querying the XML directly is not fast enough.

I have looked at some MSDN articles on optimizing SQLXML (i.e. http://msdn.microsoft.com/en-us/library/aa902661(SQL.80).aspx), and I am aware that indexing the searchable xml fields will increase search speed.

Can anyone recommend any additional resources for optimizing databases, either in this environment or in general, that are particularly useful? As always, I appreciate y'alls help

A: 

Whilst not exactly answering your question you might want to think asbout a different searching strategy. SQL Server/Oracle and MySQL are all excellent for storing large amounts of relational data however in most cases they are not that great when it comes to searching for text (obviously this depends on what you're searching on and your indexes).

I would suggest that you spend a bit of time looking at a search engine like Lucene as it might suit your needs better than SQL?

Kane
so - the optimizing strategy is to not use the current database? I call shenanigans.
David B
A: 

Show estimated execution plan.

Also:

SET STATISTICS IO ON
SET STATISTICS TIME ON
David B
A: 

It depends on what you need to do to your XML. I have a similar setup where the table structure was made "generic" and anything product-specific was stashed into an XML field.

We also noticed the hard way that querying the XML is not exceptionally fast.... and using XML indices (which SQL Server offers, too) caused our database size to jump from roughly 1 GB to over 10 GB.....

What we're doing now for select elements from the XML is this:

  • create a user-defined function which gets the XML contents as its parameter
  • extract the value from that XML parameter
  • using that UDF to define a computed, persisted column in the parent table

With this, we can extract certain key values (e.g. "ProductID" or "ProductName") from the XML, and store them on the parent table as a column. It's computed, e.g. it's always up to date, and since it's also persisted, it's stored with the table data, e.g. it isn't constantly re-queried and re-computed. And since it's persisted with the table data, you can even put an index on it.

This works very well, but it only works for cases where you have isolated, single-value things that you want to fish out of the XML. For that case, it's a great solution, and it does speed up queries on those values by several orders of magnitude.

marc_s
I had considered this solution, and it would work for the current domain of the project. Considering future diversification, I would like keep it as searchable, but I think common search functions could be handled this way as long as the more specific searches could still be performed. How has this worked for you as far as scalability?
smartcaveman
@smartcaveman: those elements that end up being an INT or a short string work very well -- you can make them persisted computed columns on the table containing the XML, and they're absolutely like regular INT or VARCHAR columns. Scalability is not an issue at all.
marc_s