views:

724

answers:

2

Hi All,

I have a MySQL indexing question for you guys.

I've got a very large table (~100Million Records) in MySQL that contains information about files. Most of the Queries I do on it involve substring operations on the file path column.

Here's the table ddl:

CREATE TABLE `filesystem_data`.`$tablename` (
    `file_id` INT( 14 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `file_name` VARCHAR( 256 ) NOT NULL ,
    `file_share_name` VARCHAR ( 100 ) NOT NULL,
    `file_path` VARCHAR( 900 ) NOT NULL ,
    `file_size` BIGINT( 14 ) NOT NULL ,
    `file_tier` TINYINT(1) UNSIGNED NULL, 
    `file_last_access` DATETIME NOT NULL ,
    `file_last_change` DATETIME NOT NULL ,
    `file_creation` DATETIME NOT NULL ,
    `file_extension` VARCHAR( 50 ) NULL ,
    INDEX ( `file_path`, `file_share_name` ) 
    ) ENGINE = MYISAM 
       };

So for example ill have a row with a file_path like:

'\\Server100\share2\Home\Zenshai\My Documents\'

And I'll extract the User's name (Zenshai in this example) with something like

SELECT substring_index(substring_index(fp.file_path,'\\',6),'\\',-1) as Username
FROM (SELECT '\\\\Server100\\share2\\Home\\Zenshai\\My Documents\\' as file_path) fp

It gets a bit ugly, but that's not really my concern right now.

What I'd like some advice on is what kind of index (if any at all) can help speed up these types of queries on this table. Any other suggestions are welcome too.

Thanks.

PS. Although the table gets very large there is enough space for indexes.

+1  A: 

You cannot use indices with your current table design.

You may add a column called USERNAME, fill it in the INSERT/UPDATE trigger with the expression you use in SELECT, and search on this column.

P. S. Just curious, you really have 100 mln+ files on your server?

Quassnoi
It's not just one server, and its not 'mine' at all. But yes, a lot more actually.
Zenshai
Also, Thank you for you answer. I will try that, it will probably be worth the extra time spent inserting to have faster queries.
Zenshai
+1  A: 

I'd create a tiny (columns, not record count) subtable that would have the file path broken out and stored like so:

FK_TO_PARENT    PATH_PART
1               Server100
1               share2
1               Home
1               Zenshai
1               My Documents

And then just index PATH_PART. Of course if the parent table is 100 Million plus, then this would be going into the billions of records.

Barry