tags:

views:

57

answers:

3

Simple situation, two column table [ID, TEXT]. The Text column has 1-10 word phrases. 300,000 rows.

Running the query:

SELECT * FROM row 
 WHERE text LIKE '%word%' 

...took 0.1 seconds. Ok.
So I created a 2nd column, the table now has: [ID, TEXT2, TEXT2] I made TEXT2 = TEXT (using an UPDATE table SET TEXT2 = TEXT]

Then I run the query for '%word%' again, and it takes 2.4 seconds.


This leaves me very very stumped but after quite a lot of blind alleys, I run OPTIMIZE on the table, and it goes to about 0.2 seconds.

Two questions:

  1. Does anyone know how the data structure get's itself in such a mess whereby doubling the data increases the search time for this query by a factor of 24?
  2. Is it standard for an un-indexed search like this to increase at the rate of the underlying table data structure as opposed to the data in the actual column being searched?

Thanks!

A: 

Sounds to me like you are the victim of Query caching. The second time your run the query (after the optimize), it already has the answer cached, and therefore the result is returned instantly. Have you tried searching for different search terms. Try running the query with caching turned off as so:

SELECT SQL_NO_CACHE * FROM row WHERE text LIKE '%word%'

To see if this changes the results, or try searching for different words, but with similar number of results to ensure that your server isn't just returned a cached value.

Kibbee
If it were cached, shouldn't it return the answer as fast if not *faster* than previously?
OMG Ponies
The query that takes 2.4 seconds takes place after an update statement, so it cannot use the query cache, because the underlying data has changed. Optimize table however doesn't change the underlying data, so the result is retuned quickly. I'm thinking that the optimize table command doesn't even matter, and that it's actually running faster because the result is cached, and the underlying data hasn't changed.
Kibbee
But the OP is asking why it took longer...
OMG Ponies
This is not correct; I did not run the query once, but ran it 100 times and got averages.
Chris Padfield
I updated my test script. It now rans each query twice (I am testing different words each time). So this is table after optomize:840 results, 0.12428307533264 seconds :: sacrifice840 results, 0.1253387928009 seconds :: sacrifice2407 results, 0.15418601036072 seconds :: let2407 results, 0.15322089195251 seconds :: let360 results, 0.13615202903748 seconds :: metal360 results, 0.13690900802612 seconds :: metal2284 results, 0.14835405349731 seconds :: tune2284 results, 0.14796185493469 seconds :: tunerun the queries:UPDATE table SET subject2 = '';UPDATE table SET subject2 = subject;
Chris Padfield
So actually the UPDATE table did not 'work' it was still fast . I had to drop the column, re-create it and re-fill the data. Then I run my test and get.602 results, 1.5551729202271 seconds :: strange602 results, 1.5191509723663 seconds :: strange840 results, 1.5133099555969 seconds :: native840 results, 1.5220057964325 seconds :: native3252 results, 1.5466899871826 seconds :: line3252 results, 1.5464508533478 seconds :: line722 results, 1.5443091392517 seconds :: aside722 results, 1.5314869880676 seconds :: aside
Chris Padfield
Sorry, formatting is bad. But basically it is showing that running the search twice makes no difference, it's ~0.15 seconds for "good table" and ~1.5 seconds for "bad table". It's not a cache thing going on. Interestingly, as well as OPTOMIZE - changing the columns from VARCHAR to CHAR had the same effect.
Chris Padfield
// note that Optimize table however doesn't change the underlying data is incorrect. It defrags the table. http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
Chris Padfield
A: 

The first time it does a table scan which sounds about right for the timing - no index involved.

Then you added the index and the mysql optimizer doesn't notice you've got a wildcard on the front, so it scans the entire index to find the records, then needs two more reads (one to the PK, then one into the table from there) to get the data record on top of that.

OPTIMIZE probably just updates the optimizer statistics so it knows it should scan the table again.

le dorfier
There is no index on any of the tables. Both queriers are doing full table scans as exepcted.
Chris Padfield
Do you end up with identical file sizes? (This is MyISAM tables, right?) And exactly what problem are you trying to solve, given that you are using a malignant wildcard query? I'm guessing it's tipping the table over the edge from what can be loaded into memory and what requires disk swapping, but you can find that out from the database statistics.
le dorfier
Running optomize takes it from 35MB to 28MB. But I don't think this is the issue, because I also tried by changing the VARCHAR columns to CHAR(250) columns. This made the table about 70MB but also had the same effect as OPTOMIZE making it ~ factor of 20 times faster.
Chris Padfield
A: 

I would think that the difference is caused by the increased row length causing the table to be fragmented on the disk. Optimize will sort that problem out, leading to the search time returning to normal (give or take a bit).

Brian Hooper