ansaurus

Question

Regex for names.

Answer 1

A:

if you add spaces then "He went to the market on Sunday" would be a valid name.

I don't think you can do this with a regex, you cannot easily detect names from a chunk of text using a regex, you would need a dictionary of approved names and search based on that. Any names not on the list wouldn't be detected.

Osama ALASSIRY 2008-11-08 20:43:00

Oh man, where's the name change form - I'm totally changing my name to "H went to the market on Sunday".

Paul Tomblin 2008-11-08 20:46:30

You can't pull names out of a body of text, but you could potentially do a match to see if a given string is a 'valid' name. Why you would bother in production is beyond me, but this isn't production, this is learning regex.

Matthew Scharley 2008-11-08 20:49:48

Right, my attempt is not to find a name in a sentence or paragraph or whatever, but check for some semblance of normality.

Humpton 2008-11-08 20:56:10

Answer 2

A:

Give up. Every rule you can think of has exceptions in some culture or other. Even if that "culture" is geeks who like legally change their names to "37eet".

Paul Tomblin 2008-11-08 20:45:07

Answer 3

+9 A:

Hyphenated Names (Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

^[A-Z][-a-zA-Z]+$

Names with Apostophies (D'Angelo)

A naive way of doing this would be as above, giving:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.

Here I'd be tempted to just do our naive way again:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

Joint Names (Ben & Jerry)

At this point you're not looking at single names anymore?

Anyway, as you can see, regexes have a habit of growing very quickly...

Matthew Scharley 2008-11-08 20:46:36

Humpton 2008-11-08 20:52:32

This doesn't handle international names. One of the comments below pointed out the use of \p{L} but you can read a lot more about unicode character classes at http://www.regular-expressions.info/unicode.html

Kimball Robinson 2010-08-18 17:44:15

Answer 4

+1 A:

^[A-Z][a-zA-Z '&-]*[A-Za-z]$

Will accept anything that starts with an uppercase letter, followed by zero or more of any letter, space, hyphen, ampersand or apostrophes, and ending with a letter.

Robert Gamble 2008-11-08 20:48:02

This does not account for international characters.

Kimball Robinson 2010-08-18 17:42:34

Answer 5

+4 A:

Basically, I agree with Paul... You will always find exceptions, like di Caprio, DeVil, or such.

Remarks on your message: in PHP, ereg is generally seen as obsolete (slow, incomplete) in favor of preg (PCRE regexes).
And you should try some regex tester, like the powerful Regex Coach: they are great to test quickly REs against arbitrary strings.

If you really need to solve your problem and aren't satisfied with above answers, just ask, I will give a go.

PhiLho 2008-11-08 20:54:19

Firstly, I'll add exploring preg to my list. Then, I'll investigate a tester. And, I totally accept that people like di Caprio will mess up my first musings... This does have a real use, but mostly it's a learning experience. What appeared here in minutes has given me a lot to go on.

Humpton 2008-11-08 21:03:42

Answer 6

+1 A:

See this question for more related "name-detection" related stuff.

http://stackoverflow.com/questions/256729/regex-to-match-a-maximum-of-4-spaces

Basically, you have a problem in that, there are effectively no characters in existence that can't form a legal name string.

If you are still limiting yourself to words without ä ü æ ß and other similar non-strictly-ascii characters.

Get yourself a copy of UTF32 character table and realise how many millions of valid characters there are that your simple regex would miss.

Kent Fredric 2008-11-08 21:19:09

Answer 7

+3 A:

I don't really have a whole lot to add to a regex that takes care of names because there are already some good suggestions here, but if you want a few resources for learning more about regular expressions, you should check out:

Regex Library's Cheat Sheet
Another cheat sheet
A regex tutorial on the DevNetwork forums: Part 1 and Part 2
PHP builder's tutorial
And if you ever need to do regex for JavaScript (it's a little different flavor), try JavaScript Kit, or this resource, or Mozilla's reference

VirtuosiMedia 2008-11-08 21:22:29

Answer 8

+3 A:

I second the 'give up' advice. Even if you consider numbers, hyphens, apostrophes and such, something like [a-zA-Z] still wouldn't catch international names (for example, those having šđčćž, or Cyrillic alphabet, or Chinese characters...)

But... why are you even trying to verify names? What errors are you trying to catch? Don't you think people know to write their name better than you? ;) Seriously, the only thing you can do by trying to verify names is to irritate people with unusual names.

Domchi 2008-11-08 21:52:01

Answer 9

+3 A:

While I agree with the answers saying you basically can't do this with regex, I will point out that some of the objections (internationalized characters) can be resolved by using UTF strings and the \p{L} character class (matches a unicode "letter").

eyelidlessness 2008-11-09 06:47:34

You can read more about unicode and regular expressions at http://www.regular-expressions.info/unicode.html

Kimball Robinson 2010-08-18 17:44:42

Answer 10

+2 A:

.+

Kevin 2008-11-09 06:59:01

Answer 11

+1 A:

This regex is perfect for me.

^([ \u00c0-\u01ffa-zA-Z'\-])+$

It works fine in php environments using preg_match(), but doesn't work everywhere.

It matches Jérémie O'Co-nor so I think it matches all UFT-8 names.

Daan 2010-01-11 21:01:59

ansaurus

tags:

views:

answers:

Regex for names.

.+

related questions