tags:

views:

1297

answers:

4

I'm using the regex

System.Text.RegularExpressions.Regex.Replace(stringToSplit, "([A-Z])", " $1").Trim()

to split strings by capital letter, for example:

'MyNameIsSimon' becomes 'My Name Is Simon'

I find this incredibly useful when working with enumerations. What I would like to do is change it slightly so that strings are only split if the next letter is a lowercase letter, for example:

'USAToday' would become 'USA Today'

Can this be done?

EDIT: Thanks to all for responding. I may not have entirely thought this through, in some cases 'A' and 'I' would need to be ignored but this is not possible (at least not in a meaningful way). In my case though the answers below do what I need. Thanks!

+6  A: 

any uppercase character that is not followed by an uppercase character:

Replace(string, "([A-Z])(?![A-Z])", " $1")

Edit:

I just noticed that you're using this for enumerations. I really do not encourage using string representations of enumerations like this, and the problems at hand is a good reason why. Have a look at this instead: http://www.refactoring.com/catalog/replaceTypeCodeWithClass.html

David Hedlund
That doesn't handle "I", i.e. "IAmBored" will not be split as "I Am Bored" as I assume the OP would expect.
Brian Rasmussen
i think you're mistaken. try this javascript for yourself:alert("IAmBored".replace(/([A-Z])(?![A-Z])/g, " $1"));it will match "A" and "B" as both are not followed by an uppercase character, and be replaced into " A" and " B" respectively
David Hedlund
(although i just realized that you're just mistaken with your choice of example, the general point is still accurate, for when the "I" is in the middle of a sentence)
David Hedlund
It also inserts a space before the "A" in "BornInTheUSA".
Alan Moore
+1  A: 

You might think about changing the enumerations; MS coding guidelines suggest Pascal casing acronyms as though they were words; XmlDocument, HtmlWriter, etc. Two-letter acryonyms don't follow this rule, though; System.IO.

So you should be using UsaToday, and your problem will disappear.

Steve Cooper
While I'm totally with you in general, this does not really solve the problem. If he'd written UsaToday, this would result in the split (i.e. human-readable) string as "Usa Today", which is kind of strange since it's always written USA. Therefore I can understand the desire to retain capitalization. On the other hand, if one wanted to show enum names to users, one should go with another solution (I tend to have string resources like EnumName_ValueName, so the key can be easily generated in code, are searchable in the resource file and can be easily localized).
OregonGhost
+3  A: 
((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))

when replaced with

" $1"

handles

TodayILiveInTheUSAWithSimon
USAToday
IAmSOOOBored

yielding

 Today I Live In The USA With Simon
USA Today
I Am SOOO Bored

In a second step you'd have to trim the string.

Tomalak
Sorry, you lost me a bit! Like this: Replace(stringToSplit, "([A-Z])(?=[a-z])|(?<=[a-z])([A-Z])", " \1") ?
Simon
The `\1` means back-reference #1. In .NET regexes, this is expressed as `$1`. Other than that, your statement seems correct.
Tomalak
(Oh, and I have changed my regex a bit. You are using the one from an older version of the answer.)
Tomalak
I've edited the answer so it uses the .NET style back-reference.
Tomalak
`([A-Z])(?<=[a-z]\1|[A-Za-z]\1(?=[a-z]))` doesn't add the space at the beginning because it can never match the first letter. :)
Alan Moore
A: 

Tomalak's expression worked for me, but not with the built-in Replace function. Regex.Replace(), however, did work.

For i As Integer = 0 To names.Length - 1
  'Worked
  names(i) = Regex.Replace(names(i), "((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))", " $1").TrimStart()

  ' Didn't work
  'names(i) = Replace(names(i), "([A-Z])(?=[a-z])|(?<=[a-z])([A-Z])", " $1").TrimStart()
Next

BTW, I'm using this to split the words in enumeration names for display in the UI and it works beautifully.

Craig Boland