views:

2580

answers:

6

I'd like a Regular Expression for C# that matches "Johnson", "Del Sol", or "Del La Range"; in other words, it should match words with spaces in the middle but no space at the start or at the end.

A: 

The ? qualifier is your friend. Makes a shortest-possible match instead of a greedy one. Use it for the first name, as in:

^(.+?) (.+)$

Group 1 grabs everything up to the first space, group 2 gets the rest.

Of course, now what do you do if the first name contains spaces?

Paul Roub
Nice and simple, but I think it will match "238 39592" as well, which aren't words.
Stuart Branham
then replace "." with "\w" or "[a-zA-Z]"
Rich
Not sure if the OP wants to match the last name by itself or within a string containing both the first and last names... I supposed the former, while you seem to have done the latter. Still, it appears your regex allows spaces at the start or end, which needs to be fixed.
Noldorin
A: 

Try something like this:

^[^\s][\w\s]*[^\s]$
Andrew Hare
I don't think last names can contain numbers...
Daniel LeCheminant
+3  A: 

This should do the job:

^[a-zA-Z][a-zA-Z ]*[a-zA-Z]$

Edit: Here's a slight improvement that allows one-latter names and hyphens/apostrophes in the name:

^[a-zA-Z'][a-zA-Z'- ]*[a-zA-Z']?$
Noldorin
Malcolm X would not be happy about this... (requiring minimum of 2 letter last names that is...)
Daniel LeCheminant
The shortest REAL name I can think of is "Ng." Should be fine. ;)
Stuart Branham
A non-zero number of people have the "real" last name of "U"...
Daniel LeCheminant
Yeah, I noticed that upon review, but didn't bother changing because I didn't consider a one-letter last name... Post is edited now anyway with a few other improvements.
Noldorin
+1 for tackling ' and -. (I don't know if the first character needs to accept an apostrophe though... or if a-- should be a valid last name)
Daniel LeCheminant
@Daniel: Cheers. And yeah, it *probably* doesn't need to accept ' as the first char, but can't hurt. Note that it shouldn't accept a hyphen as the last char, so a-b would be valid but not a-- (unless one of my quantifiers is wrong).
Noldorin
How would I change this to only allow single spaces inside the name, not more than one space?
Caveatrob
I take it you mean not more than one space in a row? Try the following (it may not quite work, as I haven't tested): ^[a-zA-Z'](([a-zA-Z])+['- ]?)*[a-zA-Z']?$
Noldorin
+4  A: 
^\p{L}+(\s+\p{L}+)*$

This regex has the following features:

  • Will match a one letter last name (e.g. Malcolm X's last name)
  • Will not match last names containing numbers (like anything with a \w or a [^ ] will)
  • Matches unicode letters

But what about last names like "O'Connor" or hyphenated last names ... hmm ...

Daniel LeCheminant
A: 

I think this is more what you were looking for:

^[^ ][a-zA-Z ]+[^ ]$

This should match the beginning of the line with no space, alpha characters or a space, and no space at the end.

This works in irb, but last time I worked with C#, I've used similar regexes:

(zero is good, nil means failed)

>> "Di Giorno" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> 0
>> "DiGiorno" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> 0
>> " DiGiorno" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> nil
>> "DiGiorno " =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> nil
>> "Di Gior no" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> 0
dexedrine
Using the [^ ] will match last names starting or ending with numbers, punctuation, etc...
Daniel LeCheminant
Danny's right. I responded with the same solution and retracted it when I realized this.
Stuart Branham
+2  A: 

In the name "Ṣalāḥ ad-Dīn Yūsuf ibn Ayyūb" (see http://en.wikipedia.org/wiki/Saladdin), which is the first name, and which is the last? What about in the name "Roberto Garcia y Vega" (invented)? "Chiang Kai-shek" (see http://en.wikipedia.org/wiki/Chang_Kai-shek)?

Spaces in names are the least of your problems! See http://stackoverflow.com/questions/620118/personal-names-in-a-global-application-what-to-store.

John Saunders
I agree. No matter how hard you try you will always find names that don't match correctly. I mean, if you don't have complete control on what names you are parsing.
Sergio Acosta