views:

225

answers:

5

I'm writing some code which needs to do string normalisation, I want to turn a given string into a camel-case representation (well, to the best guess at least). Example:

"the quick brown fox" => "TheQuickBrownFox"
"the_quick_brown_fox" => "TheQuickBrownFox"
"123The_quIck bROWN FOX" => "TheQuickBrownFox"
"the_quick brown fox 123" => "TheQuickBrownFox123"
"thequickbrownfox" => "Thequickbrownfox"

I think you should be able to get the idea from those examples. I want to strip out all special characters (', ", !, @, ., etc), capitalise every word (words are defined by a space, _ or -) and any leading numbers dropped (trailing/ internal are ok, but this requirement isn't vital, depending on the difficulty really).

I'm trying to work out what would be the best way to achieve this. My first guess would be with a regular expression, but my regex skills are bad at best so I wouldn't really know where to start.

My other idea would be to loop and parse the data, say break it down into words, parse each one, and rebuilt the string that way.

Or is there another way in which I could go about it?

A: 

thought it'd be fun to try it, here's what i came up with:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            StringBuilder sb = new StringBuilder();
            string sentence = "123The_quIck bROWN FOX1234";

            sentence = sentence.ToLower();

            char[] s = sentence.ToCharArray();

            bool atStart = true;
            char pChar = ' ';

            char[] spaces = { ' ', '_', '-' };
            char a;
            foreach (char c in s)
            {
                if (atStart && char.IsDigit(c)) continue;

                if (char.IsLetter(c))
                {
                    a = c;
                    if (spaces.Contains(pChar))
                        a = char.ToUpper(a);
                    sb.Append(a);
                    atStart = false;
                }
                else if(char.IsDigit(c))
                {
                    sb.Append(c);
                }
                pChar = c;
            }

            Console.WriteLine(sb.ToString());
            Console.ReadLine();
        }
    }
}
John Boker
Jeez, I think you and I almost arrived at the exact same place!
Daniel LeCheminant
+1  A: 

This regex matches all words. Then, we Aggregate them with a method that capitalizes the first chars, and ToLowers the rest of the string.

Regex regex = new Regex(@"[a-zA-Z]*", RegexOptions.Compiled);

private string CamelCase(string str)
{
    return regex.Matches(str).OfType<Match>().Aggregate("", (s, match) => s + CamelWord(match.Value));
}

private string CamelWord(string word)
{
    if (string.IsNullOrEmpty(word))
        return "";

    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
}

This method ignores numbers, by the way. To Add them, you can change the regex to @"[a-zA-Z]*|[0-9]*", I suppose - but I haven't tested it.

configurator
+3  A: 

How about a simple solution using Strings.StrConv in the Microsoft.VisualBasic namespace? (Don't forget to add a Project Reference to Microsoft.VisualBasic):

using System;
using VB = Microsoft.VisualBasic;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(VB.Strings.StrConv("QUICK BROWN", VB.VbStrConv.ProperCase, 0));
            Console.ReadLine();
        }
    }
}
Mitch Wheat
Wow! This is a good one...
Codex
Thanks, that + the other solutions to do the handling of other invalid did it nicely
Slace
+1  A: 

Any solution that involves matching particular characters may not work well with some character encodings, particularly if Unicode representation is being used, which has dozens of space characters, thousands of 'symbols', thousands of punctuation characters, thousands of 'letters', etc. It would be better where-ever possible to use built-in Unicode-aware functions. In terms of what is a 'special character', well you could decide based on Unicode categories. For instance, it would include 'Punctuation' but would it include 'Symbols'?

ToLower(), IsLetter(), etc should be fine, and take into account all possible letters in Unicode. Matching against dashes and slashes should probably take into account some of the dozens of space and dash characters in Unicode.

thomasrutter
+1  A: 

You could wear ruby slippers to work :)

def camelize str
  str.gsub(/^[^a-zA-z]*/, '').split(/[^a-zA-Z0-9]/).map(&:capitalize).join
end
ben_h