ansaurus

Question

querying huge database table takes too much of time in mysql

Answer 1

+2 A:

Add an index to any field that is in your where clause. Primary keys need to be unique; unique indexes need to be unique but uniqueness is not a prerequisite for an index.

Badly defined or non-existent indexes are one of the primary reasons for poor performance, and fixing these can often lead to phenomenal improvements

Quick info:

Frank V 2010-05-04 21:51:49

@Frank: Thanks for response. Do you feel changing db engine from myisam to innodb would help? Only reason behind myisam engine was to support 100Mn+ records in single db table. I would prefer to run more than 1 simultaneous query on the same table without impacting other ongoing queries.

Vijay Gharge 2010-05-04 21:58:37

I'm not entirely sure about changing the engine. I use MyISAM my self but I do not have any tables in a MySql database even near the size you do. So, I'm not the best person to answer such a question... That said, I'm sure you'd see an improvement by simply adding a few indexes...

Frank V 2010-05-04 22:05:15

Vijay Gharge 2010-05-04 22:26:21

Vijay Gharge 2010-05-04 22:48:39

@Vijay: In the future, when you address a specific person, please add comments to the respective answer. I almost didn't notice you were asking me something.

Bill Karwin 2010-05-04 23:13:50

Switching the table from MyISAM to InnoDB will indeed let you run simultaneous `SELECT` statements. This is also possible for MyISAM tables, but the conditions that permit that are more complex and rarely met.

staticsan 2010-05-05 04:00:50

@Staticsan - Thanks for response. Please note that I am referring to only 1 table with 100Mn+ unique records. Please let me know your feedback.

Vijay Gharge 2010-05-05 11:25:14

Answer 2

+6 A:

I would encourage you to learn how to use EXPLAIN to analyze the database's plan for query optimization. Also see Baron Schwartz' presentation EXPLAIN Demystified (link to PDF of his slides is on that page).

Learn how to create indexes -- this is not the same thing as a primary key or an auto-increment pseudokey. See the presentation More Mastering the Art of Indexing by Yoshinori Matsunobu.

Your table could use an index on CP_FLAG and WEB_STATUS.

CREATE INDEX CW ON RAW_LAW_20100503 (CP_FLAG, WEB_STATUS);

This helps to look up the subset of rows based on your cp_flag condition.

Then you still run into MySQL's unfortunate inefficiency with GROUP BY queries. It copies an interim result set into a temporary file on disk and sorts it there. Disk I/O tends to kill performance.

You can raise your sort_buffer_size configuration parameter until it's large enough that MySQL can sort the result set in memory instead of on disk. But that might not work.

You might have to resort to precalculating the COUNT() you need, and update this statistic periodically.

The comment from @Marcus gave me another idea. You're grouping by web status, and the set of distinct values of web status is a fairly short list and they don't change. So you could run a separate query for each distinct value and generate the results you need much faster than by using a GROUP BY query that creates a temp table to do the sorting. Or you could run a subquery for each status value, and UNION them together:

(SELECT COUNT(*), WEB_STATUS FROM RAW_LOG_20100504 WHERE CP_FLAG > 0 AND WEB_STATUS = 200)
UNION
(SELECT COUNT(*), WEB_STATUS FROM RAW_LOG_20100504 WHERE CP_FLAG > 0 AND WEB_STATUS = 404)
UNION
(SELECT COUNT(*), WEB_STATUS FROM RAW_LOG_20100504 WHERE CP_FLAG > 0 AND WEB_STATUS = 304)
UNION
...etc...
ORDER BY 1 DESC;

Because your covering index includes CP_FLAG and WEB_STATUS, these queries never need to read the actual rows in the table. They only read entries in the index, which they can access much faster because (a) they're in a sorted tree, and (b) they may be cached in memory if you allocate enough to your key_buffer_size.

The EXPLAIN report I tried (with 1M rows of test data) shows that this uses indexes well, and does not create a temp table:

+------+--------------+------------------+------+--------------------------+
| id   | select_type  | table            | key  | Extra                    |
+------+--------------+------------------+------+--------------------------+
|  1   | PRIMARY      | RAW_LOG_20100504 | CW   | Using where; Using index |
|  2   | UNION        | RAW_LOG_20100504 | CW   | Using where; Using index |
|  3   | UNION        | RAW_LOG_20100504 | CW   | Using where; Using index |
| NULL | UNION RESULT | <union1,2,3>     | NULL | Using filesort           |
+------+--------------+------------------+------+--------------------------+

The Using filesort for the last line just means it has to sort without the benefit of an index. But sorting the three rows produced by the subqueries is trivial and MySQL does it in memory.

When designing optimal database solutions, there are rarely simple answers. A lot depends on how you use the data and what kind of queries are of higher priority to make fast. If there were a single, simple answer that worked in all circumstances, the software would just enable that design by default and you wouldn't have to do anything.

You really need to read a lot of manuals, books and blogs to understand how to take most advantage of all the features available to you.

Yes, I would still recommend using indexes. Clearly it was not working before, when you were querying 100 million rows without the benefit of an index.

You have to understand that you must design indexes that benefit the specific query you want to run. I have no way of knowing if the index you just described in your comment is appropriate, because you haven't shown the other query you're trying to speed up.

Indexing is a complex topic. If you define the index on the wrong columns, or if you get the columns in the wrong order, it may not be usable by a given query. I've been supporting SQL developers since 1994, and I've never found a single, concise rule to explain how to design indexes.

You seem like you need a mentor, because you're at a stage where you need a lot of questions answered. Is there someone where you work that you could ask to help you?

Bill Karwin 2010-05-04 22:09:40

@Bill, does COUNT(*) use the covering index?

Marcus Adams 2010-05-04 22:31:01

@ Bill: Sorry for that. I got lost because of many "Add comment" options. I tried using 'CREATE INDEX USER ON RAW_LOG_20100503 (MSISDN,BYTES_RETURNED,REQ_SIZE);'but even after 41614 seconds it is not finished. I had to abort query using "ctrl + c" Would you still recommend me the use of index? It looks like indexes on 100Mn+ records do not give best performance. One more thing i.e. I would be using new table everyday. How will indexing work in that situation?

Vijay Gharge 2010-05-05 10:21:13

@ Bill : I understand that there won't be "one solution suits all" approach. But I want to understand how we can optimize mysql in my case. Hence request your valuable views/feedback that will help me improving mysql db performance.

Vijay Gharge 2010-05-05 11:09:46

Vijay Gharge 2010-05-06 06:46:58

ansaurus

tags:

views:

answers:

querying huge database table takes too much of time in mysql

related questions