I was reading an article put together by Martin Fowler regarding Composed Regular Expressions. This is where you might take code such as this:
const string pattern = @"^score\s+(\d+)\s+for\s+(\d+)\s+nights?\s+at\s+(.*)";
And break it out into something more like this:
protected override string GetPattern() {
const string pattern =
@"^score
\s+
(\d+) # points
\s+
for
\s+
(\d+) # number of nights
\s+
night
s? #optional plural
\s+
at
\s+
(.*) # hotel name
";
return pattern;
}
}
Or this:
const string scoreKeyword = @"^score\s+";
const string numberOfPoints = @"(\d+)";
const string forKeyword = @"\s+for\s+";
const string numberOfNights = @"(\d+)";
const string nightsAtKeyword = @"\s+nights?\s+at\s+";
const string hotelName = @"(.*)";
const string pattern = scoreKeyword + numberOfPoints +
forKeyword + numberOfNights + nightsAtKeyword + hotelName;
Or even this:
const string space = @"\s+";
const string start = "^";
const string numberOfPoints = @"(\d+)";
const string numberOfNights = @"(\d+)";
const string nightsAtKeyword = @"nights?\s+at";
const string hotelName = @"(.*)";
const string pattern = start + "score" + space + numberOfPoints + space +
"for" + space + numberOfNights + space + nightsAtKeyword +
space + hotelName;
This sounds easy enough to do and might have some benefits to it. I am by no means a "regex guy" and frequently find myself searching the net to locate the regex that I need rather than taking the time to compose it for myself. Having found that just right regex for a given problem I copy and paste it in and test it to make sure it does as advertised. I then might drop a comment describing what it does. But then I am off to bigger and better things in life.
I am wondering if breaking a regex pattern up in the way described in Martin Fowlers discussion will actually make things easier to understand than a comment? At the end of the day you still have an ugly regex in your code only now it is in many pieces. If ever you need to extend that expression...how does this help you really understand what the regex is doing?
I know all the die-hard perl guys out there love their regex patterns. But for those that don't deal with a regex more than once every other project (question) do you find the breaking of the regex pattern into smaller bits more or less readable? Has anyone employed this concept in a project and found it useful? ...not useful?