I have a long string that I need to parse into an array of strings that do not exceed 40 characters in length. The tricky part of this for me is making sure that the regex finds the last whitespace before 40 characters to make a clean break between strings since I don't want words cut off.
+9
A:
This regex should do the job:
".{1,40}( |$)"
(Quotes are for the string literal.)
This simply tells the regex parser to do a greedy match of any char between 1 and 40 times (i.e. as many as possible) before it then finds a single space (or the end of the string).
Noldorin
2009-06-24 21:05:29
For fun I tried implementing this without Regex and boy was it ugly compared to this.
Greg
2009-06-24 21:16:49
@Greg: Yeah exactly. I'm not someone to get overly-keen with regex, but this is surely a situation where it's highly desirable!
Noldorin
2009-06-24 21:36:53
A:
Right-trim the substrings as you go:
(?<sub>.{1,40})(?:\s+|$)|(?<sub>.{40})
The first alternative tries for a clean break, but the other is there as a fallback for blindly chopping if need be. Afterward, the substrings are available in m.Groups["sub"].Captures
.
Greg Bacon
2009-06-24 21:58:38