tags:

views:

442

answers:

3

So i've purposefully stayed away from RegEx as just looking at it kills me...ugh. But now I need it and could really use some help to do this in .NET (C# or VB.NET). I need to split a string based on capitalization or lack thereof. For example:

I'm not upPercase

  1. "I"
  2. "'m not up"
  3. "P"
  4. "ercase"

or

FBI Agent Winters

  1. "FBI A"
  2. "gent "
  3. "W"
  4. "inters"

The reason I'm doing this is to manually create small caps, in which non-capitalized strings will be sent to uppercase and their font size made 80% of the original font size. Appreciate any help that could be provided here.

+2  A: 

I think this regular expression should work /([A-Z ]*)([^A-Z]*)/

It makes those splits on that data according to rubular.com

jkeesh
Great site, I've bookmarked that one for future reference. I slightly changed your pattern and it works like a charm: `/([A-Z]+)([^A-Z]+)/`
Otaku
+3  A: 

Sounds to me like you just need to match anything that's not an uppercase letter. For example:

input = Regex.Replace(input, @"[^A-Z]+", ToSmallCaps);

...where ToSmallCaps is a MatchEvaluator delegate that converts the matched text to small caps, however it is you're doing that. For example:

static string ToSmallCaps(Match m)
{
  return String.Format(@"<span style=""whatever"">{0}</span>", m.Value);
}

EDIT: A more Unicode-friendly version regex would be @"[^\p{Lu}\p{Lt}]+", which matches one or more of anything other than an uppercase or titlecase letter, in any language.

Alan Moore
Thanks Alan. I need to match both anything that is one or more capital letters and anyone or more characters that are not capitals.
Otaku
+2  A: 

Although Alan's answer will probably solve your problem, for completeness' sake, I'm posting a regex that returns both the uppercase and the lowercase parts as matches, like in your example.

ANSI:

Regex.Matches("I'm not upPercase", @"[^a-z]+|[^A-Z]+");

Unicode:

Regex.Matches("I'm not upPercase", @"[^\p{Ll}]+|[^\p{Lu}]+");
Max Shawabkeh
Thanks for the Unicode addition.
Otaku