views:

502

answers:

5

I'm looking for a good JavaScript RegEx to convert names to proper cases. For example:

John SMITH = John Smith

Mary O'SMITH = Mary O'Smith

E.t MCHYPHEN-SMITH = E.T McHyphen-Smith  

John Middlename SMITH = John Middlename SMITH

Well you get the idea.

Anyone come up with a comprehensive solution?

A: 

Unfortunately there are too many different name formats to do this correctly. John-Joe MacDonald is always going to be a nuisance!

harriyott
You got that right! He's still the same rapscallion he was in grade school!
eyelidlessness
You should have seen what he did to Lilly-Anne da Silva's satchel; such a little scallywag!
harriyott
A: 

Agreed it will never be perfect, but looking to get the most common cases. Which is pretty much to camel case any "word" and handle hyphens and apostrophe's I guess as spaces.

raccettura
A: 

Wimps!.... Here's my second attempt. Handles "John SMITH", "Mary O'SMITH" "John Middlename SMITH", "E.t MCHYPHEN-SMITH" and "JoHn-JOE MacDoNAld"

Regex fixnames = new Regex("(Ma?C)?(\w)(\w*)(\W*)");
string newName = fixnames.Replace(badName, NameFixer);


static public string NameFixer(Match match) 
{
    string mc = "";
    if (match.Groups[1].Captures.Count > 0)
    {
     if (match.Groups[1].Captures[0].Length == 3)
      mc = "Mac";
     else
      mc = "Mc";
    }

    return 
       mc
      +match.Groups[2].Captures[0].Value.ToUpper()
      +match.Groups[3].Captures[0].Value.ToLower()
      +match.Groups[4].Captures[0].Value;
}

NOTE: By the time I realized you wanted a Javascript solution instead of a .NET one, I was having too much funny to stop....

James Curran
That's not JavaScript ;) Also, this really isn't the job for regular expressions alone - combined with lexical parsing you could get a 90% capable system, I think.
Peter Bailey
+1  A: 

Something like this?

function fix_name(name) {
    var replacer = function (whole,prefix,word) {
        ret = [];
        if (prefix) {
            ret.push(prefix.charAt(0).toUpperCase());
            ret.push(prefix.substr(1).toLowerCase());
        }
        ret.push(word.charAt(0).toUpperCase());
        ret.push(word.substr(1).toLowerCase());
        return ret.join('');
    }
    var pattern = /\b(ma?c)?([a-z]+)/ig;
    return name.replace(pattern, replacer);
}
MizardX
A: 

Mark Summerfield has done a comprehensive job of this with Lingua::EN::NameCase:

KEITH               Keith
LEIGH-WILLIAMS      Leigh-Williams
MCCARTHY            McCarthy
O'CALLAGHAN         O'Callaghan
ST. JOHN            St. John
VON STREIT          von Streit
VAN DYKE            van Dyke
AP LLWYD DAFYDD     ap Llwyd Dafydd
henry viii          Henry VIII
louis xiv           Louis XIV

The above is written in Perl, but it makes heavy use of regular expressions, so you should be able to glean some good techniques.

Here the relevant source:

sub nc {

    croak "Usage: nc [[\\]\$SCALAR]"
        if scalar @_ > 1 or ( ref $_[0] and ref $_[0] ne 'SCALAR' ) ;

    local( $_ ) = @_ if @_ ;
    $_ = ${$_} if ref( $_ ) ;           # Replace reference with value.

    $_ = lc ;                           # Lowercase the lot.
    s{ \b (\w)   }{\u$1}gox ;           # Uppercase first letter of every word.
    s{ (\'\w) \b }{\L$1}gox ;           # Lowercase 's.

    # Name case Mcs and Macs - taken straight from NameParse.pm incl. comments.
    # Exclude names with 1-2 letters after prefix like Mack, Macky, Mace
    # Exclude names ending in a,c,i,o, or j are typically Polish or Italian

    if ( /\bMac[A-Za-z]{2,}[^aciozj]\b/o or /\bMc/o ) {
        s/\b(Ma?c)([A-Za-z]+)/$1\u$2/go ;

        # Now correct for "Mac" exceptions
        s/\bMacEvicius/Macevicius/go ;  # Lithuanian
        s/\bMacHado/Machado/go ;        # Portuguese
        s/\bMacHar/Machar/go ;
        s/\bMacHin/Machin/go ;
        s/\bMacHlin/Machlin/go ;
        s/\bMacIas/Macias/go ;  
        s/\bMacIulis/Maciulis/go ;  
        s/\bMacKie/Mackie/go ;
        s/\bMacKle/Mackle/go ;
        s/\bMacKlin/Macklin/go ;
        s/\bMacQuarie/Macquarie/go ;
    s/\bMacOmber/Macomber/go ;
    s/\bMacIn/Macin/go ;
    s/\bMacKintosh/Mackintosh/go ;
    s/\bMacKen/Macken/go ;
    s/\bMacHen/Machen/go ;
    s/\bMacisaac/MacIsaac/go ;
    s/\bMacHiel/Machiel/go ;
    s/\bMacIol/Maciol/go ;
    s/\bMacKell/Mackell/go ;
    s/\bMacKlem/Macklem/go ;
    s/\bMacKrell/Mackrell/go ;
    s/\bMacLin/Maclin/go ;
    s/\bMacKey/Mackey/go ;
    s/\bMacKley/Mackley/go ;
    s/\bMacHell/Machell/go ;
    s/\bMacHon/Machon/go ;
    }
    s/Macmurdo/MacMurdo/go ;

    # Fixes for "son (daughter) of" etc. in various languages.
    s{ \b Al(?=\s+\w)  }{al}gox ;   # al Arabic or forename Al.
    s{ \b Ap        \b }{ap}gox ;       # ap Welsh.
    s{ \b Ben(?=\s+\w) }{ben}gox ;  # ben Hebrew or forename Ben.
    s{ \b Dell([ae])\b }{dell$1}gox ;   # della and delle Italian.
    s{ \b D([aeiu]) \b }{d$1}gox ;      # da, de, di Italian; du French.
    s{ \b De([lr])  \b }{de$1}gox ;     # del Italian; der Dutch/Flemish.
    s{ \b El        \b }{el}gox unless $SPANISH ;   # el Greek or El Spanish.
    s{ \b La        \b }{la}gox unless $SPANISH ;   # la French or La Spanish.
    s{ \b L([eo])   \b }{l$1}gox ;      # lo Italian; le French.
    s{ \b Van(?=\s+\w) }{van}gox ;  # van German or forename Van.
    s{ \b Von       \b }{von}gox ;  # von Dutch/Flemish

    # Fixes for roman numeral names, e.g. Henry VIII, up to 89, LXXXIX
    s{ \b ( (?: [Xx]{1,3} | [Xx][Ll]   | [Ll][Xx]{0,3} )?
            (?: [Ii]{1,3} | [Ii][VvXx] | [Vv][Ii]{0,3} )? ) \b }{\U$1}gox ;

    $_ ;
}
Robert Krimen