One fact that I've aways found funny is that Google is in fact run by bioinformatics (’kay, I find that funny because I'm a bioinf…thingy). Let me explain.
Bioinformatics early on had the challenge to search small texts in gigantic strings very fast. For us, the “gigantic string” is of course DNA. Often not a single DNA but a database of several DNAs from different species/individuals. The small texts are proteins or their genetic counterpart, a gene. Most of the first work of computational biologists was restricted to find homologies between genes. This is done to establish the function of newly found genes by noting similarities to genes that are already known.
Now, these DNA strings get very big indeed and (lossy!) search has to be done extremely efficiently. Most of the modern theory of string lookup was thus developed in the context of computational biology.
However, quite some time ago, conventional text search was exhausted. A new approach was needed that allowed searching large strings in sublinear time, that is, without looking at each single character. It was discovered that this can be solved by pre-processing the large string and building a special index data structure over it. Many different such data structures have been proposed. Each have their strengths and weaknesses but there's one that is especially remarkable because it allows a lookup in constant time. Now, in the orders of magnitude in which Google operates this isn't strictly true anymore because load balancing across servers, preprocessing and some other sophisticated stuff has to be taken into account.
But in the essence, the so-called q-gram index allows a lookup in constant time. The only disadvantage: The data structure gets ridiculously big. Essentially, to allow for a lookup of strings with up to q characters (hence the name), it requires a table that has one field for each possible combination of q letters (that is, qS, where S is the size of the alphabet, say 36 (= 26 + 10)). Additionally, there has to be one field for each letter position in the string that was indexed (or in the case of google, for each web site).
To mitigate the sheer size, Google will probably use multiple indices (in fact, they do, to offer services like spelling correction). The topmost ones won't work on character level but on word level instead. This reduces q but it makes S infinitely bigger so they will have to use hashing and collision tables to cope with the infinite number of different words.
On the next level, these hashed words will point to other index data structures which, in turn, will hash characters pointing to websites.
Long story short, these q-gram index data structures are arguably the most central part of Google's search algorithm. Unfortunately, there are no good non-technical papers explaining how q-gram indices work. The only publication that I know that contains a description of how such an index works is … alas, my bachelor thesis.