tags:

views:

102

answers:

4

I am currently trying to optimize some bobj reports where our backend is Teradata. The Teradata optimizer seems very finicky and I was wondering if anyone has come up with a solution or a workaround to get the optimizer to treat likes in a similar regard to equals.

My issue is that we allow the user to input one of two methods:
 1. Enter the Number:
    or
 2. Enter a Number like:

Option one performs like a dream while option two is dragging our query times from 6 seconds to 2 minutes.

In addition to this; does anyone know of any good articles, discussions, vidoes, etc.. on optimizing SQL statements for the teradata optimizer?

+1  A: 

I'd take it that Number is indexed? Teradata uses hashing for indexing, so equals will result in the index being used, while like will result in a full table scan.

If you have a genuine need for using like, there's not an awful lot you can do. One thing you could try is using Substr(Number, 1, 3) = '123' rather than Number LIKE '123%'. I've gotten small performance improvements from this in the past, but don't expect anything spectacular.

lins314159
Thanks for your response. Yes unfortunatly we do have to produce a sort of wildcarding method. I will definitly take a look at substr on some of our inputs. Do you have any metrics on your improvement? We are searching upwards of 10 million rows so my assumption is a small improvement good have a large impact.
tathamr
The number is just an example of our searching. This could be strings as well. I am also trying to think of the implementation of this. The user has the ability to input a string of N length for both strings and numbers. I wonder if I can post process the string to do the correct substr.
tathamr
Sorry, this was at my previous workplace, so I don't have any numbers with me.
lins314159
+1  A: 

You will need a fulltext index / pre-tokenized index, e.g. lucene, and also a two parse search.

e.g. When insert a "12345" to your database, create link from "1", "12", "123", "234"...etc to "12345".

Then, when use find something like "123**", find "123" from the lookup table and the seek to the record "12345"

Dennis Cheung
A: 

If you are doing a direct VARCHAR comparison, ie

Column LIKE 'VALUE'

then you could try to use a NUSI on that column. Make sure that you collect statistics for the table's primary index and for the index

bogertron
Is there the possibility of setting something as NUSI and as the Primary Index of a table. The issue is that it is a PI of the table but, when using the like command it by passes it.
tathamr
@tathamr: If the Primary Index is composed of additional columns you should be fine. However, if the PI is only that VARCHAR column, then you will not be able to create a secondary index on top of it.
bogertron
+1  A: 

Because the column is defined as a VARCHAR and you are using the LIKE operator you eliminate the possibility of using the PI for single AMP access. Remember, the primary indexes first job is distributing the data across the AMPs in the system. Because you are using the LIKE operator against the PI the optimizer must perform an 'all AMP' operation to satisfy the LIKE operator.

WHERE MyPIColumn LIKE '123%'

The hashing of values starting with 123 can and will end up on multiple AMPs.

WHERE MyPIColum = '123'

The hashing of 123 will place every single record on the same AMP. Querying for '123' will always be a single AMP operation.

Statistics on this may help with row estimates but will likely not eliminate the 'all AMP' operation.

  1. Is this a Unique PI or Non-Unique PI?
  2. Why was the data type chosen to be character over numeric? Although GT(E) or LT(E) would likely result in the same 'All-AMP' operation'.
  3. Is this PI shared by other tables in the model to facilitate AMP local join strategies?
RobPaller
Rob thank you for the reply.. The numeric is just an example. We were given a requirement that a user must be able to wildcard just about anything in our query set. This is not optimal because we have in many cases 4-5 tables (sometimes L/R outer joined) that have 4-7 millions rows in each.
tathamr
You may wish to consider surrogate keys that are not used for accessing the data. This would allow you to then build NUSI on the values that are being used for selection criteria with the hope that the optimizer will consider the NUSI for accessing the data at least some of the time. Which it should depending on the data demographics of the column in the statistics. I would suggest incorporating a control table that has the last surrogate value stored for each table to avoid scanning the larger tables for the seed value of the surrogate.
RobPaller