ansaurus

Question

Difference between performance of the two sql queries?

Answer 1

+2 A:

If fieldname is indexed, most of commercial databases can transform the first query into an interval search

select * from tablename where fieldname>="xyz" and fieldname<"xy{"

which is very fast.

grep 2010-08-15 06:40:07

what about the second one, does it have any effect.

iamrohitbanga 2010-08-15 07:20:01

btw xyz could be anything. It could be '2wssdfj,fsf@sef34' for example.

iamrohitbanga 2010-08-15 07:21:52

@iamrohitbanga this conversation become more and more abstract. Actually nobody knows what do you mean or want. Can you please devise an example on some real life grounds? Thanks

Col. Shrapnel 2010-08-15 07:32:41

In which case your interval search would be >='2wssdfj,fsf@sef34' and < '2wssdfj,fsf@sef35'. The key point is that incrementing the ASCII value of the last character and doing a range search is identical to the 'string%'.

JulesLt 2010-08-15 13:51:11

Answer 2

+6 A:

There is definitely difference between performance on all DB's. First case will be definitely faster if column is indexed.

I had similar instance in my project where user was also allowed to search "ends with" (like your second query).

As this was frequently used operation and query was slow,

We added additional column to table which stored reverse of fieldname.
indexed this column
whenever ends with was searched , we searched in this new column :) (by reversing original search string)

so your second query becomes:

 select * from tablename where fieldname_rev="xyz%";

This approach made it as fast as starts with query.

YoK 2010-08-15 07:36:21

@Col. Shrapnel thats what i am looking for. @YoK is this documented anywhere in the theory or in db manuals.Can you cite some such source? Thanks a lot.

iamrohitbanga 2010-08-15 07:55:27

@iamrohitbanga I don't know is it in theory or db manuals ? But as I told I am writing this from my project experience. I will try to find is it in some theory.

YoK 2010-08-15 08:15:43

@iamrohitbanga lol so you can't even tell what are you looking for without a help of someone? You need to work on that. Knowing exactly what you want to do is always helps

Col. Shrapnel 2010-08-15 08:36:43

Hey Col. I mentioned the same thing before, it is just that YoK has had a practical experience with this issue. He has additionally reworded the problem.I was just curious to find out if there is any difference in the performance. The use case is general. I just want to know if a starts with match is diff from an ends with match. I feel that if the length of the string is used in implementing the ends with, there should not be much difference in the implementations. However if you use a lexical analyzer it would read the string from beginning to the end. But what about dbs? what do they use?

iamrohitbanga 2010-08-15 09:34:19

Answer 3

+3 A:

Picking up from your comment : " I just want to know if a starts with match is diff from an ends with match".

Firstly - remember that we are not looking for the best algorithm to match a string. We are looking for the best algorithm to find all matching strings in a set of N rows. We want to do better than 'Do algorithm X, N times'.

If fieldname is NOT indexed, then there will be very little difference in performance between the two queries - the SQL engine is just going to do a match on the first 3 or last 3 bytes of the string, which is simply a matter of offsetting to the right memory location.

If the fieldname IS indexed, there will be a huge difference in performance between the two searches, because rather than examining all N rows, we can discard most of the data.

i.e. for the "xyz%" version, we can use a binary search.

We start at the middle element, which happens to be 'peter'. We can immediately discard everything before 'peter' and get the middle element on the remainder - 'samantha', and so on, until we find the entries starting 'xyz'.

With the "%xyz" version, we cannot do this, as ANY string could potentially match at the end, we need to look at every string.

As the size of our table expands, the difference between these two approaches becomes large.

The solution of creating a field/index for the reverse of fieldname allows us to use the binary search technique again. (In some databases it is actual possible to do this without creating an extra field, but through using particular index types, virtual columns, etc).

This is simplified a lot - for detail on the actual implementation of database indexes, look into B-Tree and B*Tree indexes.

JulesLt 2010-08-15 13:45:59

Answer 4

+1 A:

Yes, there is a difference between the following two queries:

select * from tablename where fieldname LIKE "xyz%";
select * from tablename where fieldname LIKE "%zyx";

The equals ("=") operator doesn't allow wildcards in SQL - you need to use LIKE
The queries are entirely different
- "xyz%" will return records that start with "xyz"
- "%xyz" will return records that end with "xyz"
Assuming an index exists on the fieldname column, "%xyz" can not use the index - but"xyz%" could, which means it would be faster.

The fastest means of finding substrings within text is to use Full Text Search (FTS) - both Oracle and MySQL have their own native functionality, and there are 3rd party tools like Sphinx and Solr.

OMG Ponies 2010-08-15 21:41:43

ansaurus

tags:

views:

answers:

Difference between performance of the two sql queries?

related questions