tags:

views:

302

answers:

4

Hi,

I am trying to query the windows desktop search API using SQL.

I have to say I really HATE the windows 7 search GUI, and so I decided to write my own. I have a lot of files indexed (approx 1.000.000), and I want to do a search for names. Something like: Show me every name which contains "bunny".

But here i run into a performance problem. Searching for

"SELECT \"System.ItemPathDisplay\" FROM \"SystemIndex\" WHERE System.FileName LIKE 'egon%'";

is really fast. Also the %egon alternative. But %egon% takes forever. I am not sure if it is in the nature of the index (I understand that the possibilities increase enormously) or if I am doing something wrong...

Ok, now my actual question: Is it correct that the windows index is only a big SQL database. If so, where can I find exact information about the structure of the DB (primary keys, indexes). If I have that, its basically just optimizing SQL.

Alternative question: Does anybody knows a fast SQL statement to find all files with egon somewhere in the name...

Chris

[Edit why I do not like the search GUI]

Well, its just not intuitive, compared to XP. If you disable the dog and use the old XP interface, I could create a search query like: All files older than 1 month, bigger than 10 MB, name pattern STAR_homework_STAR.docx - Try this in Windows 7 without "learning" the syntax. And hell, I do NOT want to learn another syntax just to find one file...

The other main problem are maybe my search habits. Most of the time I somehow know the file name (or parts) and simply want the location. And if you use the search this way you ran into several problem:

  • First of all, you always have to prefix it with name:
  • Then the folder name layout is stupid (it is ordering by parent folder, not full path, I THINK, because.. tada... see next point)
  • Then, even more annyoing, if you have a list of results and you try to sort them, it takes forever...

And now I really think my system has a bug. I tried to quickly check it, searched in some average size folder for "test" and he found some files. Then I tried to sort them for folders (to verify my second point) and now he is just searching forever... I mean really, while I am typing he tries to find the word "hello"... oh, finished - he found approx 20 files. So, now, lets try something.... Ok, now it seems like he has recovered.. But still, to slow for my taste...

So, enough cursing about search :-)

A: 

This is slow because you are unable to use an index. The reason is that you are searching for a match anwhere in the string rather than at the start of the string which means you must scan the entire table for the contents.

lomaxx
+2  A: 

It looks like they're building an index on the name, so it can use the index as long as you've specified the beginning of the string, but if you haven't, it has to use a table scan.

Assuming they're using Microsoft's full-text search engine, then try using something like:
... WHERE system.filename CONTAINS 'egon'

There are basically two choices: it'll be rejected as invalid (i.e. this SQL interface doesn't support their F-T search extension) or else it'll be quite a bit faster.

EDIT:Oops -- the syntax should be "contains(system.filename, 'egon')". Sorry 'bout that.

Jerry Coffin
Definitily 1000 times faster (just 1 second)...
Christian
A: 

Maybe try

"SELECT \"System.ItemPathDisplay\" FROM \"SystemIndex\" WHERE CONTAINS(System.FileName, 'egon')";
Jason Down
A: 

What about using another Search tool?I recommend Lookeen, it has very fast indexing of data and contains a lot of usefull filters:-)

Carlos