ansaurus

Question

Answer 1

+1 A:

This link might help. You might also be able to plug it through a (possibly modified) speech synthesiser engine and analyse how much trouble it's having generating the speech, without actually generating it.

Chris Dennett 2010-07-15 18:33:29

Answer 2

+3 A:

What if you would use the Google Search API to see if the name returns any results?

Matthew J Morrison 2010-07-15 18:33:40

http://www.google.com/search?q=dfjkdfjkd

KennyTM 2010-07-15 18:35:25

That gets back to a name seeming "human" - rather than a specific language.

Matthew J Morrison 2010-07-15 18:36:34

clever, but not trustable.

Capt Otis 2010-07-15 18:37:25

@Capt Otis - I agree

Matthew J Morrison 2010-07-15 18:38:18

This seems like a sensible idea, if only to highlight the most ridiculous names in an admin UI

Chris Johnson 2010-07-15 18:47:21

@Kenny: oh no, I'm trapped in recursion; the fourth result in that google search is this page!

Andy E 2010-07-15 23:15:49

This won't work... Look at Kenny's example... I mean, "fffffffff" returns a bunch of pages.

Peter Ajtai 2010-07-16 01:15:03

Answer 3

A:

It seems as though you are going to need a fairly complex preg function. I don't want to take the time to write one for you, as you will learn more writing it yourself, but I will help along the way if you post some attempts.

http://php.net/manual/en/function.preg-match.php

Capt Otis 2010-07-15 18:35:59

Good luck with that. Whether it's code or a regular expression it's still going to be impossible not to have false positives.

Wade Williams 2010-07-15 18:39:31

@Wade Williams - is "impossible not to have false positives" a triple negative?

Matthew J Morrison 2010-07-15 18:43:24

Yeah good point. But almost no solution is going to be perfect here.

Capt Otis 2010-07-15 19:08:40

Answer 4

+11 A:

I would recommend concentrating your energy on building a user interface that makes it brain-dead easy to list all new names to an administrator, and a big fat "force to rename" mechanism that minimizes the admin's workload, rather than trying to define the incredibly complex and varied rules that make a name (and program a regular expression to match them!).

Update - one thing comes to mind, though: Second Life used to allow you to freely specify a first name (maybe they check against a database of first names, I don't know) and then gives you a selection of a few hundred pre-defined last names to choose from. For an online RPG, that may already be enough.

Unicron 2010-07-15 18:36:23

Not an answer, but a good answer nonetheless

Yar 2010-07-15 18:38:20

@Daniel 'yar' Rosenstark, I don't get such remarks. I mean, only if people simply answer the question being asked, is *that* a true answer to a question? I really hope not. I mean, if someone asks how to build a house with just a hammer, should one try to help this person on his/her way with just the hammer, or should one answer that it might not be a good idea to use only a hammer and suggest other tools as well? I sure hope it's the latter.

Bart Kiers 2010-07-15 18:48:00

Adding to this, the main problem with other methods is false-positives, but you could use an other method to sort by "most likely to be fake".

Brendan Long 2010-07-15 18:48:07

@Bart K. thanks. :) But I don't think @Daniel was attacking the answer, quite the contrary. And strictly speaking, my answer *is* arguably not quite what the OP asked for - even though we do think it's for the better that it isn't.

Unicron 2010-07-15 19:25:54

@Bart K., I was being facetious, mostly. I also was one of the first upvoters of @Unicron's answer (totally unverifiable, but true :)). SOMETIMES, however (obviously not the OP's case), we are confined to a narrow solution space, but you're right. The answer's update is good too.

Yar 2010-07-15 19:31:14

@Unicron, no, I didn't mean that he attacked your answer. I've just seen it happen quite a few times: someone getting an answer that did not address the actual question 100% and then getting a reply that it wasn't really an answer (which is non-sense, IMO).

Bart Kiers 2010-07-15 19:33:37

@Daniel, yeah, sorry, I probably came over a bit harsh. It's probably because I've seen the *"Not an answer"* without the part *"but a good answer nonetheless"* and finally decided to give a reply (which I haven't done in the past...). :)

Bart Kiers 2010-07-15 19:37:06

... and I finally wanted to use my *house-building-analogy* , of course. :)

Bart Kiers 2010-07-15 19:38:45

@Bart K. no worries, we're all trying to use as many cool analogies as possible where applicable.

Yar 2010-07-15 22:56:00

Answer 5

+2 A:

I had this issue as well. An easy way to solve it is to force user names to validate against a database of world-wide names. Essentially you have a database on the backend with a few hundred thousand first and last names for both genders, and make their name match.

With a little bit of searching on google, you can find many name databases.

George 2010-07-15 18:37:24

Answer 6

+2 A:

Could I somehow check so at least you cant use more than 2 of the same letter beside each other?? and also maybe if it contains vowels

If you just want this, you can do:

preg_match('/(.)\\1\\1/i', $name);

This will return 1 if anything appears three times in a row or more.

Artefacto 2010-07-15 18:37:45

Answer 7

+6 A:

You could use a metaphone implementation and then look for "unnatural" patterns:

http://www.php.net/manual/en/function.metaphone.php

This is the PHP function for metaphone string generation. You pass in a string and it returns the phonetic representation of the text. You could, in theory, pass a large number of "human" names and then store a database of valid combinations of phonemes. To test a questionable name, just see if the combinations of phonemes are in the database.

Hope this helps!

mattbasta 2010-07-15 18:38:17

This seems closer to what the OP was looking for. An algorithm has already been documented and implemented: http://www.sil.org/computing/lascruces.html

Kilanash 2010-07-15 19:44:53

Answer 8

+3 A:

I say take @Unicron's approach, of easy admin rejection, but on each rejection, add the name to a database of banned names. You might be able to use this data to detect specific attacks generation large numbers of users based on patterns. Will of course be very difficult to detect one-offs.

sparkey0 2010-07-15 18:41:50

Good idea storing away precedents!

Unicron 2010-07-15 19:22:40

Answer 9

+3 A:

Would limiting the amount of consonants or vowels in a row, and preventing repeating help? As a regex:

if(preg_match('/[bcdfghjklmnpqrtsvwxyz]{4}|[aeiou]{4}|([a-z])\1{2}/i',$name)){
    //reject
}

Possibly use iconv with ASCII//TRANSLIT if you allow accentuated characters.

Wrikken 2010-07-15 18:46:44

Answer 10

A:

What do you think about delegating the responsibility of creating users to a third party source (like Facebook, Twitter, OpenId...)?

Doing that will not solve your problem, but it will be more work for a user to create additional accounts - which (assuming that the users are lazy, since most are) should discourage the creation of additional "dummy" users.

Matthew J Morrison 2010-07-15 18:49:08

Answer 11

+1 A:

You should try implementing a modified version of a Naive Bayes spam filter. For example, in normal spam detection you calculate the probability of a word being spam and use individual word probabilities to determine if the whole message is spam.

Similarly, you could download a word list, and compute the probability that a pair of letters belongs to a real word.

E.g., create a 26x26 table say, T. Let the 5th row represent the letter e and let entry T(5,1) be the number of times ea appeared in your word list. Once you're done counting, divide each element in each row with the sum of the row so that T(5,1) is now the percentage of times ea appears in your word list in a pair of letter starting with e.

Now, you can use the individual pair probability (e.g. in Jimy that would be {Ji,im,iy} to check whether Jimy is an acceptable name or not. You'll probably have to determine the right probability to threshold at, but try it out --- it's not that hard to implement.

Jacob 2010-07-15 23:48:31

ansaurus

tags:

views:

answers:

check if a name seems "human"?

related questions