views:

102

answers:

4

Users of our website often type in a lot of garbage for the name and address information. eg, all CAPS, all lower case etc.

It looks a lot better if we fix the case for them, but can anyone suggest a good way of doing this. A simple approach is just to capitalise each word in their name, but this fails when dealing with some names. Here are a few examples...

  • bob mcdonald
  • sarah o'connor
  • MR PETE SMITH

and here is what I would like to transform them into

  • Bob McDonald
  • Sarah O'Connor
  • Mr Pete Smith

I'm using PHP if it helps.

+9  A: 

Leave it as at is. If users don't respect themselves enough to write their names correctly, why should you care?

Having said that, write a subroutine to post-process the names, it will handle some ubiquitous cases.

  • Capitalize first letters of words excluding non-capitalizable words like "von"
  • Look for specific patterns and custom-update matched words (e.g. capitalize the third letter if the word starts with "mc")

Due to the complexity of the problem I think you would have to resort to manual editing of names after the correction algorithm has worked. A user registers, the name is post-processed then added to some moderation queue where you check it and update it as needed before it gets used and maybe printed out on invoices and parcel labels. Of course, if you are sure you know what you are doing.

Developer Art
+3  A: 

You can use $name = ucwords(strtolower($name)); in PHP to get you close to what you are wanting.

meme
This was the approach I was thinking - but the other comments regarding correctness made me stop posting all together. This would get you probably 90% of the way there - at least for a generic US population. Then bumping up against a lookup table could massuage this a bit more. Maybe only correct case for those names that have all lower or all upper - assume correct case for those names that come across mixed case? Just a thought.
ChronoFish
What I might do, is detect if the name is all lower case or all upper case. If so, do this. If they have used a mixture of case, then assume they did the work for me and keep it untouched.
rikh
+6  A: 

There is no correct way to do this, or to put it more specifically, every method you choose will be wrong.

Sometimes O'Connor will be customarily spelled by its owner as O'connor. Or alternatively someone could have assumed it was O'Connor when it's actually Oconnor, or vice versa. Sometimes McDonald is Mcdonald. You could legally have the name "bob smith", where all the letters are lowercase.

I think the only fix I would do is to change ALL-CAPS to First Letter Capitalization. Otherwise, just leave it alone, because there's no way to distinguish a cruddy speller from someone who decided that they wanted a crazy name.

jprete
I like your suggestion to checking for all caps and doing something in that case.
rikh
+4  A: 

And of course, there's people like me with a capital letter in the middle of their name. You'll insult me if you try to 'correct' it. Like 'Developer Art' said, if people don't respect themselves enough to write their names correctly, it's not your job to fix it.

Aric TenEyck