How do I find all CamelCased words in a document with a regular expression? I'm only concerned with Upper camel case (i.e., camel cased words in which the first letter is capitalized).
([A-Z][a-z0-9]+)+
Assuming English. Use appropriate character classes if you want it internationalizable. This will match words such as "This". If you want to only match words with at least two capitals, just use
([A-Z][a-z0-9]+){2,}
UPDATE: As I mentioned in a comment, a better version is:
[A-Z]([A-Z0-9]*[a-z][a-z0-9]*[A-Z]|[a-z0-9]*[A-Z][A-Z0-9]*[a-z])[A-Za-z0-9]*
It matches strings that start with an uppercase letter, contain only letters and numbers, and contain at least one lowercase letter and at least one other uppercase letter.
([A-Z][a-z\d]+)+
Should do the trick for upper camel case. You can add leading underscores to it as well if you still want to consider something like _IsRunning upper camel case.
This seems to do it:
/^[A-Z][a-z]+([A-Z][a-z]+)+/
I've included Ruby unit tests:
require 'test/unit'
REGEX = /^[A-Z][a-z]+([A-Z][a-z]+)+/
class RegExpTest < Test::Unit::TestCase
# more readable helper
def self.test(name, &block)
define_method("test #{name}", &block)
end
test "matches camelcased word" do
assert 'FooBar'.match(REGEX)
end
test "does not match words starting with lower case" do
assert ! 'fooBar'.match(REGEX)
end
test "does not match words without camel hump" do
assert ! 'Foobar'.match(REGEX)
end
test "matches multiple humps" do
assert 'FooBarFizzBuzz'.match(REGEX)
end
end
Adam Crume's regex is close, but won't match for example IFoo
or HTTPConnection
. Not sure about the others, but give this one a try:
\b[A-Z][a-z]*([A-Z][a-z]*)*\b
The same caveats as for Adam's answer regarding digits, I18N, underscores etc.
You can test it out here.