views:

397

answers:

11

I have an online RPG game which I'm taking seriously. Lately I've been having problem with users making bogus characters with bogus names, just a bunch of different letters. Like Ghytjrhfsdjfnsdms, Yiiiedawdmnwe, Hhhhhhhhhhejejekk. I force them to change names but it's becoming too much. What can I do about this?

Could I somehow check so at least you can't use more than 2 of the same letter beside each other?? And also maybe if it contains vowels

+1  A: 

This link might help. You might also be able to plug it through a (possibly modified) speech synthesiser engine and analyse how much trouble it's having generating the speech, without actually generating it.

Chris Dennett
+3  A: 

What if you would use the Google Search API to see if the name returns any results?

Matthew J Morrison
http://www.google.com/search?q=dfjkdfjkd
KennyTM
That gets back to a name seeming "human" - rather than a specific language.
Matthew J Morrison
clever, but not trustable.
Capt Otis
@Capt Otis - I agree
Matthew J Morrison
This seems like a sensible idea, if only to highlight the most ridiculous names in an admin UI
Chris Johnson
@Kenny: oh no, I'm trapped in recursion; the fourth result in that google search is this page!
Andy E
This won't work... Look at Kenny's example... I mean, "fffffffff" returns a bunch of pages.
Peter Ajtai
A: 

It seems as though you are going to need a fairly complex preg function. I don't want to take the time to write one for you, as you will learn more writing it yourself, but I will help along the way if you post some attempts.

http://php.net/manual/en/function.preg-match.php

Capt Otis
Good luck with that. Whether it's code or a regular expression it's still going to be impossible not to have false positives.
Wade Williams
@Wade Williams - is "impossible not to have false positives" a triple negative?
Matthew J Morrison
Yeah good point. But almost no solution is going to be perfect here.
Capt Otis
+11  A: 

I would recommend concentrating your energy on building a user interface that makes it brain-dead easy to list all new names to an administrator, and a big fat "force to rename" mechanism that minimizes the admin's workload, rather than trying to define the incredibly complex and varied rules that make a name (and program a regular expression to match them!).

Update - one thing comes to mind, though: Second Life used to allow you to freely specify a first name (maybe they check against a database of first names, I don't know) and then gives you a selection of a few hundred pre-defined last names to choose from. For an online RPG, that may already be enough.

Unicron
Not an answer, but a good answer nonetheless
Yar
@Daniel 'yar' Rosenstark, I don't get such remarks. I mean, only if people simply answer the question being asked, is *that* a true answer to a question? I really hope not. I mean, if someone asks how to build a house with just a hammer, should one try to help this person on his/her way with just the hammer, or should one answer that it might not be a good idea to use only a hammer and suggest other tools as well? I sure hope it's the latter.
Bart Kiers
Adding to this, the main problem with other methods is false-positives, but you could use an other method to sort by "most likely to be fake".
Brendan Long
@Bart K. thanks. :) But I don't think @Daniel was attacking the answer, quite the contrary. And strictly speaking, my answer *is* arguably not quite what the OP asked for - even though we do think it's for the better that it isn't.
Unicron
@Bart K., I was being facetious, mostly. I also was one of the first upvoters of @Unicron's answer (totally unverifiable, but true :)). SOMETIMES, however (obviously not the OP's case), we are confined to a narrow solution space, but you're right. The answer's update is good too.
Yar
@Unicron, no, I didn't mean that he attacked your answer. I've just seen it happen quite a few times: someone getting an answer that did not address the actual question 100% and then getting a reply that it wasn't really an answer (which is non-sense, IMO).
Bart Kiers
@Daniel, yeah, sorry, I probably came over a bit harsh. It's probably because I've seen the *"Not an answer"* without the part *"but a good answer nonetheless"* and finally decided to give a reply (which I haven't done in the past...). :)
Bart Kiers
... and I finally wanted to use my *house-building-analogy* , of course. :)
Bart Kiers
@Bart K. no worries, we're all trying to use as many cool analogies as possible where applicable.
Yar
+2  A: 

I had this issue as well. An easy way to solve it is to force user names to validate against a database of world-wide names. Essentially you have a database on the backend with a few hundred thousand first and last names for both genders, and make their name match.

With a little bit of searching on google, you can find many name databases.

George
+2  A: 

Could I somehow check so at least you cant use more than 2 of the same letter beside each other?? and also maybe if it contains vowels

If you just want this, you can do:

preg_match('/(.)\\1\\1/i', $name);

This will return 1 if anything appears three times in a row or more.

Artefacto
+6  A: 

You could use a metaphone implementation and then look for "unnatural" patterns:

http://www.php.net/manual/en/function.metaphone.php

This is the PHP function for metaphone string generation. You pass in a string and it returns the phonetic representation of the text. You could, in theory, pass a large number of "human" names and then store a database of valid combinations of phonemes. To test a questionable name, just see if the combinations of phonemes are in the database.

Hope this helps!

mattbasta
This seems closer to what the OP was looking for. An algorithm has already been documented and implemented: http://www.sil.org/computing/lascruces.html
Kilanash
+3  A: 

I say take @Unicron's approach, of easy admin rejection, but on each rejection, add the name to a database of banned names. You might be able to use this data to detect specific attacks generation large numbers of users based on patterns. Will of course be very difficult to detect one-offs.

sparkey0
Good idea storing away precedents!
Unicron
+3  A: 

Would limiting the amount of consonants or vowels in a row, and preventing repeating help? As a regex:

if(preg_match('/[bcdfghjklmnpqrtsvwxyz]{4}|[aeiou]{4}|([a-z])\1{2}/i',$name)){
    //reject
}

Possibly use iconv with ASCII//TRANSLIT if you allow accentuated characters.

Wrikken
A: 

What do you think about delegating the responsibility of creating users to a third party source (like Facebook, Twitter, OpenId...)?

Doing that will not solve your problem, but it will be more work for a user to create additional accounts - which (assuming that the users are lazy, since most are) should discourage the creation of additional "dummy" users.

Matthew J Morrison
+1  A: 

You should try implementing a modified version of a Naive Bayes spam filter. For example, in normal spam detection you calculate the probability of a word being spam and use individual word probabilities to determine if the whole message is spam.

Similarly, you could download a word list, and compute the probability that a pair of letters belongs to a real word.

E.g., create a 26x26 table say, T. Let the 5th row represent the letter e and let entry T(5,1) be the number of times ea appeared in your word list. Once you're done counting, divide each element in each row with the sum of the row so that T(5,1) is now the percentage of times ea appears in your word list in a pair of letter starting with e.

Now, you can use the individual pair probability (e.g. in Jimy that would be {Ji,im,iy} to check whether Jimy is an acceptable name or not. You'll probably have to determine the right probability to threshold at, but try it out --- it's not that hard to implement.

Jacob