ansaurus

Question

Answer 1

+2 A:

What kind of indexes do you have on that table? This index:

CREATE INDEX fs_search_idx ON fs_walk_scan(file_path, file_size desc)

would speed this query up significantly... if you don't already have one like it.

Update:

You said there are already indexes on file_path and file_size... are they individual indexes? Or is there one single index with both columns indexed together? The difference would be huge for this query. Even with 22 subqueries, if indexed right, this should be blazing fast.

bobwienholt 2008-10-02 19:03:40

I think this would be impossible with the condition that changes to the structure of the db are not allowed

warren 2008-10-02 19:07:31

they are individual, but i don't see how linking them into one would help

Zenshai 2008-10-02 19:17:58

Essentially, you are searching on file_path and then within that you are picking the 30 largest files (using LIMIT and ORDER BY). So you are essentially searching on two columns at the same time. If both pieces of data you are searching on is available in the index together, searching is fast.

bobwienholt 2008-10-02 19:53:21

Answer 2

A:

How about something like this (haven't tested it, but looks close):

select * from fs_walk_scan where file_path like '\\\\server' and file_path like 'root$\\%' order by file_size desc

This way you're doing a pair of comparisons on the individual field which will generically match what you've described. It may be possible to use a regex, too, but I've not done it.

warren 2008-10-02 19:04:06

Are you suggesting doing this in each UNIONed query to improve performance (cause that could help) or did I misunderstand?

Zenshai 2008-10-02 19:22:14

Answer 3

+2 A:

You could use a regexp:

select * from fs_walk_scan
  where file_path regexp '^\\\\server(1\\[ghi]|2\\[fg]|3\\h|[45]\\i)root$\\'

Otherwise if you can modify your table structure, add two columns to hold the server name and base path (and index them), so that you can create a simpler query:

select * from fs_walk_scan
  where server = 'server1' and base_path in ('groot$', 'hroot$', 'iroot$')
     or server = 'server2' and base_path in ('froot$', 'groot$')

You can either set up a trigger to initialise the fields when you insert the record, or else do a bulk update afterwards to fill in the two extra columns.

dland 2008-10-02 20:10:08

Answer 4

A:

Thanks, these are helpful. However this doesn't solve my main problem.

If i do (for exapmle):

select * from fs_walk_scan
  where file_path regexp '^\\\\server(1\\[ghi]|2\\[fg]|3\\h|[45]\\i)root$\\'
  order by file_size desc
  LIMIT 0,30

I will get the top 30 files across all 8 volumes, 30 rows in total. What I am trying to get is the top 30 per volume, 240 rows in total.

I think this going to need some looping and variables. Unfortunately I don't know how to use that stuff in MySQL.

Zenshai 2008-10-02 20:27:04

Answer 5

+1 A:

You could do something like this... assuming fs_list has a list of your "LIKE" searches:

DELIMITER $$

DROP PROCEDURE IF EXISTS `test`.`proc_fs_search` $$
CREATE PROCEDURE `test`.`proc_fs_search` ()
BEGIN

DECLARE cur_path VARCHAR(255);
DECLARE done INT DEFAULT 0;


DECLARE list_cursor CURSOR FOR select file_path from fs_list;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

SET @sql_query = '';

OPEN list_cursor;

REPEAT
  FETCH list_cursor INTO cur_path;

  IF NOT done THEN
    IF @sql_query <> '' THEN
      SET @sql_query = CONCAT(@sql_query, ' UNION ALL ');
    END IF;

    SET @sql_query = CONCAT(@sql_query, ' (select * from fs_walk_scan where file_path like ''', cur_path , ''' order by file_size desc limit 0,30)');
  END IF;

UNTIL done END REPEAT;

SET @sql_query = CONCAT(@sql_query, ' order by file_path, file_size desc');

PREPARE stmt FROM @sql_query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END $$

DELIMITER ;

bobwienholt 2008-10-02 22:36:47

Answer 6

+1 A:

Try this.
You want to get every record where there are fewer than 30 records with greater file size and the same file path.

SELECT * 
FROM   fs_walk_scan a
WHERE  ( SELECT COUNT(*) 
         FROM   fs_walk_scan b 
         WHERE  b.file_size  > a.file_size 
         AND    b.file_path  = a.file_path
       ) < 30

Edit:

Apparently this performs like a dog. So... How about this looping syntax?

SELECT DISTINCT file_path
INTO tmp1
FROM   fs_walk_scan a

DECLARE path VARCHAR(255);

SELECT MIN(file_path)
INTO   path
FROM   tmp1 

WHILE  path IS NOT NULL DO
    SELECT * 
    FROM   fs_walk_scan
    WHERE  file_path = path
    ORDER BY file_size DESC
    LIMIT 0,30

    SELECT MIN(file_path)
    INTO   path
    FROM   tmp1
    WHERE  file_path > path 
END WHILE

The idea here is to 1. get a list of the file paths 2. loop, doing a query for each path which will get the 30 largest file sizes.

(I did look up the syntax, but I'm not very hot on MySQL, so appologies if it's not quite there. Feel free to edit/comment)

AJ 2008-10-03 11:27:15

tried the above. left the query running for 48 hours (by accident), and it still hasn't come back with a result.

Zenshai 2008-10-09 15:36:50

ouch. let me go away and think about that, then! If I come up with something better, I'll edit my post.

AJ 2008-10-09 15:42:38

Thanks for helping, i really appreciate it. However i dont think this updated method will change anything performance-wise. The problem is that we are still doing a "select * from fs_walk_scan" in every iteration of the while loop, and this is very expensive because the table is huge.

Zenshai 2008-10-09 18:59:07

What can I say? The table _is_ huge. It's going to be slow. Another strategy would be to maintain a "top 30" list using a trigger, but that wouldn't really be more efficient, and it would slow down inserts.

AJ 2008-10-10 11:17:54

You're right it'll be slow no matter what. However I think there may be a way to do it right. What im thinking of right now is using a perl (but eventually native) script to go through the table one row at a time, and test the filesize of the row against the lowest of a top 30 list for its path loc.

Zenshai 2008-10-13 15:40:31

So there will be a top 30 array for each of the volumes, and upon encountering each row ill find the array it can potentially go into, and test whether its filesize is greater than the lowest in the list, if it is, ill find the right place to insert it and discard the lowest filesize to make room.

Zenshai 2008-10-13 15:43:37

ansaurus

tags:

views:

answers:

Converting a UNION query in MySQL

Edit:

related questions