views:

41

answers:

1

Hi!

I'd like to get some ideas on how I can create an efficient MySQL database that can handle high traffic auto-complete requests like Google's new auto-SERP-update feature.

The key to this is, I am trying to take the content of my book and I want to index the text in a way such that the database requests the relevant text in the quickest/least overhead possible.

For Example:
If I were to type the text: "as", I would essentially scour the database (the entire book) and see a result set for sentences in the book that say:

"...that is as I was saying..."  
"as I intended..."  
"This is as good as it gets"  
...  

But as soon as I type a "k" and it spells "ask", the result set changes to (ie):

"Ask your father..."  
"...I will ask you to do this."  
...  

In addition, I was considering adding helper words, so if you are in the middle of typing "askew", but currently only have "ask" spelled, the database would grab all words containing "ask", you will see helper words like "asking", "askew", "asked", "ask".

Importantly, this book is a rule book, so it has it's own unique key for each rule. Thus, when a person types in a word or 2, all rules with that combination of words will show up in the result set.

I am willing to do any research above what exact help anyone chooses to give. I am at a loss for the kinds of keywords I should be looking for in such a subject-- so in addition to specific solutions, keywords on this type of database structure would also be appreciated and helpful.

I have read something about full-text search? Can this be a solution, or is that not efficient enough for my purposes?

I know how to do ajax calls and auto-completion already... that is not the issue I am asking for solutions for. What I need is understanding on how to structure and index the database such that when I write a script to parse the content of my book in text format, and insert the tokens into the database, it will be later pulled in the most efficient way. I expect a high level of traffic eventually on the site, so minimizing request overhead is of paramount importance.

At an initial state, I was thinking of something like tables for each character length greater than 1... thus I'd have tables called "two_letters", "three_letters", etc.

One example record in the "two_letter" database could be "as", and it has a many-to-many relationship with every rule in the book that contains "as" in it... thus:

"as", "125"   
"as", "2024"  
"as", "4"  

Of course, the smaller the letter set, the larger the database will be. This book is very big, so we're talking millions of records here! One for each combination of 2-letters, and the rule it is associated with. THEN, do it all over again with 3-letter combinations, until there are no more words. This is an initial brainstorming attempt only and may be a terrible idea, but it's my first thought on this.

Once the script is run, the database will create the tables and insert the records as it goes. It will likely read over the content many times for each length of characters.

I want it to recognize multi-word combinations as well, just as a keyphrase in Google would be auto-updated in the SERP. Thus, as the user types "they are go", you may find:

"they are gone already..."  
"they are going to the movies later."  
"how they are gonna get there is..."  

I am essentially asking for that exact auto-complete feature in Google, but the content is a book, not indexed websites on the internet.

I look forward to hearing from some of the geniuses out there that get what I'm asking for here and feel like impressing some people! :)

Thanks in advance to everyone.

A: 

I have to recommend Sphinx. It's an amazing search engine for data stored in mysql (or other databases).

bemace