I have some strings, entered by users, that may look like this:
- ++7
- 7++
- 1++7
- 1+7
- 1++7+10++15+20+30++
Those are to mean:
- Anything up to and including 7
- Anything from 7 and up
- 1 and 7 and anything inbetween
- 1 and 7 only
- 1 to 7, 10 to 15, 20 and 30 and above
I need to parse those strings into actual ranges. That is I need to create a list of objects of type Range which have a start and an end. For single items I just set the start and end to the same, and for those that are above or below, I set start or end to null. For example for the first one I would get one range which had start set to null and end set to 7.
I currently have a kind of messy method using a regular expression to do this splitting and parsing and I want to simplify it. My problem is that I need to split on + first, and then on ++. But if I split on + first, then the ++ instances are ruined and I end up with a mess.
Looking at those strings it should be really easy to parse them, I just can't come up with a smart way to do it. It just have to be an easier (cleaner, easier to read) way. Probably involving some easy concept I just haven't heard about before :P
The regular expression looks like this:
private readonly Regex Pattern = new Regex(@" ( [+]{2,} )?
([^+]+)
(?:
(?: [+]{2,} [^+]* )*
[+]{2,} ([^+]+)
)?
( [+]{2,} )? ", RegexOptions.IgnorePatternWhitespace);
That is then used like this:
public IEnumerable<Range<T>> Parse(string subject, TryParseDelegate<string, T> itemParser)
{
if (string.IsNullOrEmpty(subject))
yield break;
for (var item = RangeStringConstants.Items.Match(subject); item.Success; item = item.NextMatch())
{
var startIsOpen = item.Groups[1].Success;
var endIsOpen = item.Groups[4].Success;
var startItem = item.Groups[2].Value;
var endItem = item.Groups[3].Value;
if (endItem == string.Empty)
endItem = startItem;
T start, end;
if (!itemParser(startItem, out start) || !itemParser(endItem, out end))
continue;
yield return Range.Create(startIsOpen ? default(T) : start,
endIsOpen ? default(T) : end);
}
}
It works, but I don't think it is particularly readable or maintainable. For example changing the '+' and '++' into ',' and '-' would not be that trivial to do.