views:

34

answers:

2

hello guys,

I was thinking about text driven search by user input. often you are searching in a database of addresses, where you can find customers and so on.

has anybody any idea how to find out which of the typed words is the name, which is the street name, which is the company name? and secondly if the name is a double name like "Lee Harvey", how can I find out that the two words Lee and Harvey belong together? Same problem with company names like "frank the baker inc."...

Is there any algorithm or best practice strategy? thanks for links, tutorials, scripts and all other help ;-)

A: 

Don't care, just perform full-text search. Then you should check the result items for which field contains the search terms. Also, you may display items in separate lists (terms found int name, term found in address). The only difficulty is if John Smith is living in the John Smiht street, you must decide, which list/lists the result item belongs to.

ern0
+1  A: 

What you basically want is a search engine :) Here are the basic steps you need to follow -

  1. You need to create an 'Inverted Index' of the content you want to be searched on.
  2. The index is 'name'=>'value' pair. You can have this pair in whichever way you want (tuned according to your data & needs.

Eg. for your problem of double names, you could split all your names into single words & index it like so -

 'lee'=>'lee harvey'
 'harvey'=>'lee harvey'
 ...

this way when anyone searches for 'lee' they get 'lee harvey'. There are other better approaches to this called "n-gram" indexing. Check it out...

You could possibly build indexes of names, addresses, emails etc & when the user types a query check it against all your indexes with the approach suggested above. After you get the results then merge them. Maybe you could introduce the notion of rank so that you can sort your results & show the most latest or most relevant ones at the top. For this you need to figure out a way to score your terms...

MovieYoda