views:

343

answers:

5

I notice sometimes users mistype their email address (in a contact-us form), for example, typing @yahho.com, @yhoo.com, or @yahoo.co instead of @yahoo.com

I feel that this can be corrected on-the-spot with some javascript. Simply check the email address for possible mistakes, such as the ones listed above, so that if the user types [email protected], a non-obtrusive message can be displayed, or something like that, suggesting that he probably means @yahoo.com, and asking to double check he typed his email correctly.

The Question is:
How can I detect -in java script- that a string is very similar to "yahoo" or "yahoo.com"? or in general, how can I detect the level of similarity between two strings?

P.S. (this is a side note) In my specific case, the users are not native English speakers, and most of them are no where near fluent, the site itself is not in English.

A: 

It might be possible to use a regex, but personally, it would take me way too long to write one I'd be happy with that could get all the possible permutations without causing too many false positives.

So, here's what I would do:

  • Hard-code a list of all the common typing errors.
  • Use a case-insensitive string comparison to compare the email to each string in the list .
  • If there's a match, display a warning - "Did you mean yahoo.com?"

Yeah, it's not very pretty, but it doesn't seem (at least from your question) like you'll have that many to check, so it should perform just fine. It also doesn't seem (at least to me) like something worth putting a whole lot of time into, so this is an incredible simple solution that could be done in about 15-30 min.

Daniel Schaffer
+2  A: 

Check out soundex and Difference: If you use ajax you can have the sql-server check the soundex-value of the words against "correct" domains and get suggestions back. It is also possible to make an own version of soundex (its not that complicated).

http://stackoverflow.com/questions/299949/sql-servers-soundex-function-on-non-latin-character-sets

http://stackoverflow.com/questions/270771/data-structure-for-soundex-algorithm

http://stackoverflow.com/questions/41424/how-do-you-implement-a-did-you-mean

Stefan
Seems like a bit of overkill for a "contact us form", no?
Daniel Schaffer
@Daniel, a simple soundex-function can be made in less than 20 lines of code. But allmost "everything" is overkill in a "contact us form". :)
Stefan
Well I suppose it's rather telling that all my "Contact Us" forms are mailto: links...
Daniel Schaffer
+4  A: 

In addition to soundex, you may also want to have a look at algorithms for determining Levenshtein distance.

Abie
it seems that Levenshtein is what I'm after!
hasen j
+1  A: 

Of course, as a first step, you could strip out the domain name and do a DNS lookup - that should at least tell you if it appears to be legitimate.

Software Monkey
I just want a simple check client-side, no net connections.
hasen j
+7  A: 

Here's a dirty implementation that could kind of get you some simple checks using the Levenshtein distance. Credit for the "levenshteinenator" goes to this link. You would add whatever popular domains you want to the domains array and it would check to see if the distance of the host part of the email entered is 1 or 2 which would be reasonably close to assume there's a typo somewhere.

levenshteinenator = function(a, b) {
 var cost;

 // get values
 var m = a.length;
 var n = b.length;

 // make sure a.length >= b.length to use O(min(n,m)) space, whatever that is
 if (m < n) {
  var c=a;a=b;b=c;
  var o=m;m=n;n=o;
 }

 var r = new Array();
 r[0] = new Array();
 for (var c = 0; c < n+1; c++) {
  r[0][c] = c;
 }

 for (var i = 1; i < m+1; i++) {
  r[i] = new Array();
  r[i][0] = i;
  for (var j = 1; j < n+1; j++) {
   cost = (a.charAt(i-1) == b.charAt(j-1))? 0: 1;
   r[i][j] = minimator(r[i-1][j]+1,r[i][j-1]+1,r[i-1][j-1]+cost);
  }
 }

 return r[m][n];
}

// return the smallest of the three values passed in
minimator = function(x,y,z) {
 if (x < y && x < z) return x;
 if (y < x && y < z) return y;
 return z;
}

var domains = new Array('yahoo.com','google.com','hotmail.com');
var email = '[email protected]';
var parts = email.split('@');
var dist;
for(var x=0; x < domains.length; x++) {
 dist = levenshteinenator(domains[x], parts[1]);
 if(dist == 1 || dist == 2) {
  alert('did you mean ' + domains[x] + '?');
 }
}
Paolo Bergantino
Nice one! +1 (I assume it works) ;-)
Stefan
+1, bonus for providing an implementation
hasen j