ansaurus

Question

Answer 1

+1 A:

I'm not sure I understand the question. Do you have a record-storage schema selected and need to know how best to get stuff in it, or do you need the storage schema as well? Are you planning on parsing the XML into columns that are nText, or just loading the XML file, tags and all, into nText columns?

Generally speaking, go for a narrow, deep table over a wide, shallow table if you're looking for performance. Narrow tables usually require fewer indices to speed searching on the most common columns, and those indices allow the engine to break the search into parallelizable chunks. Most engines are also smart enough to prioritize "cheap" filter conditions over "expensive"; the LIKE clause if present will almost surely be executed last in a compound WHERE clause, so if you can provide any other information to narrow the search, especially on indexed columns, you can speed up the general performance of your query.

You may consider (I can't believe I'm about to recommend this) a key-question-answer schema for the main element data (between opening and closing tags of each element). For any case in which even part of the schema definition is standardized, a traditional statically-defined table will be easier to work with on practically all counts, but if you don't even know the structure of your data other than that it's in XML, such an approach will require some sort of mapping between a particular file's metadata and a table of generic fields, and in that case key-question-answer will combine the two for better query performance.

Whatever information you have that uniquely identifies a particular record (and/or data on which you need to search very quickly to narrow result sets inexpensively) will be your key, the element name is your question, and the value is your answer. This will support a very flexible data naming standard. As the data is XML and thus relevant data can be stored as attributes of an element (part of the opening tag), you may require similar but simpler tables for searchable attribute data of your tags, or you can normalize attribute data into the main table based on some well-known mashup. Having these very narrow, row-per-column lines also allows you to very easily move non-searched columns into an "archive" table; you probably still need to keep the data around in case they want to start searching on a column, but if you currently don't search on a column, you can get it out of the table on which you do the heavy lifting, which will drastically reduce query times.

If you're looking for approximate values of a CLOB field, you simply aren't going to beat a LIKE query. Yes, it will be slow on very large text values; the only way to help that is to split that text up in a manner that won't cause false non-matches (where LIKE won't find a match across split boundaries), and I don't think you'll find a universal method of doing that; you must know something about what you're storing, such as that it's in paragraphs and a match will never cross paragraph boundaries anyway.

When all is said and done, I think you'll find that, regardless of data size, most SQL RDBMSes work rather well on just about any intelligent schema when given enough processor muscle. Searching on an index is logarithmic in nature as opposed to linear, and so a good indexing schema will help the engine break down the search space considerably.

KeithS 2010-09-27 17:08:39

There is only going to be one column in the database called "SearchText" This isn't going to be XML data, but rather data extracted from an xml field. I hope this helped clarify things a bit.

Michael G 2010-09-27 17:20:28

Answer 2

A:

Some of my thoughts, may be it can help.

1- As performance is critical for your case, I would say that denormalize your table, try to minimize joins.

2- Partition your table.

3- Identify columns which is frequently used and make index on them.

saurabh 2010-09-27 17:41:36

ansaurus

tags:

views:

answers:

Searchable Table - What would you do?

related questions