views:

60

answers:

2

Hi,
I have been tasked with coming up with a routine that will suggest alternative domain names to register if the customers original requested domain name is already registered.

The first step I think would be to split the requested domain back in to bits so that I could work out alternatives to try.

eg. mybigredtruck.com would be broken up in to "my", "big", "red" & "truck"

Then I would need some way of working out alternatives for these.

Does anybody know of any methods, components / web services that could do any of this functions. Any ideas will be greatfully accepted.

A: 

The most common implementation of suggestion algorithms that I have seen is to prepend or append relevant words. For domain names, the most common is to change the top-level domain (.com, .net, .gov, etc).

As far as splitting a delimiter-less string by the most likely English words, I think you may be in for a rough time.

A Google search for "mybigredtruck" doesn't suggest "my big red truck" as an alternate search. To me, that implies that the algorithm is extremely complex, if one even exists.

Bobwise
-1 splitting a spaceless-string into words is not that difficult
BlueRaja - Danny Pflughoeft
I have to strongly agree with Bobwise. Stick with appending digits to the name, or changing the suffix. Unless computing with natural language and dictionaries is an area of expertise for you, then don't even consider that route.Thinking as a programming manager, I would say you're way over-thinking this one. If your manager really expects the word-based solution of you, then do the 30-minute version first, check it in, and then explain you'll need another 4 weeks to prototype the complex version.
Detmar
I did originally think that I might be able to send ajax queries off to google for search suggestions and use that to predict the word boundries, incrementing the query string 1 char at a time until I matched part of the requested string. Thanks for you time in responding.
Jonathan Stanton
BlueRaja - Where would you recommend I look for information on splitting such a string?
Jonathan Stanton
@Jonathan: See my comment above
BlueRaja - Danny Pflughoeft
+1  A: 

Here is a good place to start with a matching algorithm:

  • Obtain a dictionary of words

  • Remove nonalphabetic characters from the input string

  • Remove the TLD extension from the
    input string

  • Assuming that the input text is spelt correctly, at to match it with a dictionary entry; if it does not match (in the case of undelimited concatenated words) then try one less character in a loop until it matches. Store the match but look for all other matches. Do the same for the remainder of the string.

The correct match would be the one where all substrings of the full input string is matched, e.g., wwww.soilofgarden.com = 'soil of garden' and not 'so?? of garden'

Carnotaurus