views:

390

answers:

6

Google/GMail/etc. doesn't offer partial or prefix search (e.g. stuff*) though it could be very useful. Often I don't find a mail in GMail, because I don't remember the exact expression.

I know there is stemming and such, but it's not the same, especially if we talk about languages other than English.

Why doesn't Google add such a feature? Is it because the index would explode? But databases offer partial search, so surely there are good algorithms to tackle this problem.

What is the problem here?

+2  A: 

Because you can't sensibly derive what is meant with car*:

Cars? Carpets? Carrots?

Google's algorithms compare document texts, also external inbound links to determine what a document is about. With these wildcards all these algorithms go into junk

Developer Art
It should return all results in this case. The user wants it, the user gets it.
tom
I suppose it could be done technically but for most humans it would probably make no sense. Maybe submit a request to Google. Who knows, maybe it's a great idea they simply missed?
Developer Art
google misses the idea of pattern globbing? I hardly think so...
Stefano Borini
+3  A: 

Google doesn't actually store the text that it searches. It stores search terms, links to the page, and where in the page the term exists. That data structure is indexed in the traditional database sense. I'd bet using wildcards would make the index of the index pretty slow and as Developer Art says, not very useful.

Byron Whitlock
+1  A: 

Google Code Search can search based on regular expressions, so they do know how to do it. Of course, the amount of data Code Search has to index is tiny compared to the web search. Using regex or wildcard search in the web search would increase index size and decrease performance to impractical levels.

interjay
A: 

The secret to finding anything in Google is to enter a combination of search terms (or quoted phrases) that are very likely to be in the content you are looking for, but unlikely to appear together in unrelated content. A wildcard expression does the opposite of this. Just enter the terms you expect the wildcard to match, keeping in mind that Google will do stemming for you. Back in the days when computers ran on steam, Lycos (iirc) had pattern matching, but they turned it off several years ago. I presume it was putting too much load on their servers.

Hugh Brackett
A: 

Google does search partial words. Gmail does not though. Since you ask what's the problem here, my answer is lack of effort. This problem has a solution that enables to search in constant time and linear space but not very cache friendly: Suffix Trees. Suffix Arrays is another option that is more cache-friendly and still time efficient.

Rui Ferreira
An alternative to Suffix Trees is N-Grams. Which are performant just not storage efficient. But a solution nonetheless.
Cody Caughlan
A: 

Gmail search is actually not that great. It does not for example handle stemming (try searching for "ARTS" vs "ART", etc.) and does not search through your attachments. Search improvements are an often requested enhancement. As with most things Google, it's probably already on the dev path...

Sorry!

J

Cloudbreak NZ