Consider the following search results:
- Google for 'David' - 591 millions hits in 0.28 sec
- Google for 'John' - 785 millions hits in 0.18 sec
OK. Pages are indexed, it only needs to look up the count and the first few items in the index table, so speed is understandable.
Now consider the following search with AND operation:
- Google for 'David John' ('David' AND 'John') - 173 millions hits in 0.25 sec
This makes me ticked ;) How on earth can search engines get the result of AND operations on gigantic datasets so fast? I see the following two ways to conduct the task and both are terrible:
- You conduct the search of 'David'. Take the gigantic temp table and conduct a search of 'John' on it. HOWEVER, the temp table is not indexed by 'John', so brute force search is needed. That just won't compute within 0.25 sec no matter what HW you have.
- Indexing by all possible word combinations like 'David John'. Then we face a combinatorial explosion on the number of keys and not even Google has the storage capacity to handle that.
And you can AND together as many search phrases as you want and you still get answers under a 0.5 sec! How?