views:

57

answers:

2

This is silly, but I haven't found this information. If you have names of concepts and suitable references, just let me know.

I'd like to understand how should I validate a given named id for a generic entity, like, say, an email login, just like Yahoo, Google and Microsoft do.

I mean... If you do have an user named foo, trying to create foo2 will be denied, as it is likely to be someone trying to mislead users by using a fake id.

+1  A: 

You're going to have to take a two pass approach.

The first is a potential RegEx expression to validate that the entity name meets your specifications as much as possible. For example, disallowing certain characters.

The second is to perform some type of fuzzy search during the name creation. This could be as simple as a LIKE '%value%' where clause or as complicated as using some type of full-text search and limiting hits to a certain relevance rating.

That said, I would guess the failure rate (both false positives and false negatives ) match would be high enough to justify not doing this.

Good luck.

Chris Lively
Thought so, and I also thought about Levenshtein and Hamming, perhaps keeping track of an acceptable length-to-diff ratio to watch for.
aldrinleal
+1  A: 

Coming to mind:

  • Levenshtein Distance
  • Hamming Distance
Loki
Yes, I thought about that. Well, Lucene might come handy as well.
aldrinleal