ansaurus

Question

Answer 1

+2 A:

My problem is that I need to split on + first, and then on ++. But if I split on + first, then the ++ instances are ruined and I end up with a mess.

You could split on this regex first:

(?<!\+)\+(?!\+)

That way, only the 'single' +'s are being split on, leaving you to parse the ++'s. Note that I am assuming that there cannot be three successive +'s.

The regex above simple says: "split on the '+' only if there's no '+' ahead or behind it".

Edit:

After reading that there can be more than 2 successive +'s, I recommend writing a small grammar and letting a parser-generator create a lexer+parser for your little language. ANTLR can generate C# source code as well.

Edit 2:

But before implementing any solution (parser or regex) you'd first have to define what is and what isn't valid input. If you're going to let more than two successive +'s be valid, ie. 1+++++5, which is [1++, +, ++5], I'd write a little grammar. See this tutorial how that works: http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No+Past+Experience+Required

And if you're going to reject input of more than 2 successive +'s, you can use either Lasse's or my (first) regex-suggestion.

Bart Kiers 2009-11-16 12:26:33

I have exactly the same regex on my clipboard... +1.

Kobi 2009-11-16 12:28:12

How does a lexer+parser work?

Svish 2009-11-16 13:26:47

See my 2nd edit.

Bart Kiers 2009-11-16 13:38:59

Nice. Will try to read up on these things :) Thanks!

Svish 2009-11-16 14:02:36

Answer 2

+2 A:

Here's some code that uses regular expressions.

Note that the issue raised by Bart in the comments to your question, ie. "How do you handle 1+++5", is not handled at all.

To fix that, unless your code is already out in the wild and not subject to change of behaviour, I would suggest you change your syntax to the following:

use .. to denote ranges
allow both + and - for numbers, for positive and negative numbers
use comma and/or semicolon to separate distinct numbers or ranges
allow whitespace

Look at the difference between the two following strings:

1++7+10++15+20+30++
1..7, 10..15, 20, 30..

The second string is much easier to parse, and much easier to read.

It would also remove all ambiguity:

1+++5 = 1++ + 5 = 1.., 5
1+++5 = 1 + ++5 = 1, ..5

There's no way to parse wrong the second syntax.

Anyway, here's my code. Basically it works by adding four regex patterns for the four types of patterns:

num
num++
++num
num++num

For "num", it will handle negative numbers with a leading minus sign, and one or more digits. It does not, for obvious reasons, handle the plus sign as part of the number.

I've interpreted "and up" to mean "up to Int32.MaxValue" and same for down to Int32.MinValue.

public class Range
{
    public readonly Int32 From;
    public readonly Int32 To;

    public Range(Int32 from, Int32 to)
    {
        From = from;
        To = to;
    }

    public override string ToString()
    {
        if (From == To)
            return From.ToString();
        else if (From == Int32.MinValue)
            return String.Format("++{0}", To);
        else if (To == Int32.MaxValue)
            return String.Format("{0}++", From);
        else
            return String.Format("{0}++{1}", From, To);
    }
}

public static class RangeSplitter
{
    public static Range[] Split(String s)
    {
        if (s == null)
            throw new ArgumentNullException("s");

        String[] parts = new Regex(@"(?<!\+)\+(?!\+)").Split(s);
        List<Range> result = new List<Range>();

        var patterns = new Dictionary<Regex, Action<Int32[]>>();

        patterns.Add(new Regex(@"^(-?\d+)$"),
            values => result.Add(new Range(values[0], values[0])));
        patterns.Add(new Regex(@"^(-?\d+)\+\+$"),
            values => result.Add(new Range(values[0], Int32.MaxValue)));
        patterns.Add(new Regex(@"^\+\+(-?\d+)$"),
            values => result.Add(new Range(Int32.MinValue, values[0])));
        patterns.Add(new Regex(@"^(-?\d+)\+\+(-?\d+)$"),
            values => result.Add(new Range(values[0], values[1])));

        foreach (String part in parts)
        {
            foreach (var kvp in patterns)
            {
                Match ma = kvp.Key.Match(part);
                if (ma.Success)
                {
                    Int32[] values = ma.Groups
                        .OfType<Group>()
                        .Skip(1) // group 0 is the entire match
                        .Select(g => Int32.Parse(g.Value))
                        .ToArray();
                    kvp.Value(values);
                }
            }
        }

        return result.ToArray();
    }
}

Unit-tests:

[TestFixture]
public class RangeSplitterTests
{
    [Test]
    public void Split_NullString_ThrowsArgumentNullException()
    {
        Assert.Throws<ArgumentNullException>(() =>
        {
            var result = RangeSplitter.Split(null);
        });
    }

    [Test]
    public void Split_EmptyString_ReturnsEmptyArray()
    {
        Range[] result = RangeSplitter.Split(String.Empty);
        Assert.That(result.Length, Is.EqualTo(0));
    }

    [TestCase(01, "++7", Int32.MinValue, 7)]
    [TestCase(02, "7", 7, 7)]
    [TestCase(03, "7++", 7, Int32.MaxValue)]
    [TestCase(04, "1++7", 1, 7)]
    public void Split_SinglePatterns_ProducesExpectedRangeBounds(
        Int32 testIndex, String input, Int32 expectedLower,
        Int32 expectedUpper)
    {
        Range[] result = RangeSplitter.Split(input);
        Assert.That(result.Length, Is.EqualTo(1));
        Assert.That(result[0].From, Is.EqualTo(expectedLower));
        Assert.That(result[0].To, Is.EqualTo(expectedUpper));
    }

    [TestCase(01, "++7")]
    [TestCase(02, "7++")]
    [TestCase(03, "1++7")]
    [TestCase(04, "1+7")]
    [TestCase(05, "1++7+10++15+20+30++")]
    public void Split_ExamplesFromQuestion_ProducesCorrectResults(
        Int32 testIndex, String input)
    {
        Range[] ranges = RangeSplitter.Split(input);
        String rangesAsString = String.Join("+",
            ranges.Select(r => r.ToString()).ToArray());

        Assert.That(rangesAsString, Is.EqualTo(input));
    }

    [TestCase(01, 10, 10, "10")]
    [TestCase(02, 1, 10, "1++10")]
    [TestCase(03, Int32.MinValue, 10, "++10")]
    [TestCase(04, 10, Int32.MaxValue, "10++")]
    public void RangeToString_Patterns_ProducesCorrectResults(
        Int32 testIndex, Int32 lower, Int32 upper, String expected)
    {
        Range range = new Range(lower, upper);
        Assert.That(range.ToString(), Is.EqualTo(expected));
    }
}

Lasse V. Karlsen 2009-11-16 12:45:53

First, very impressive. But - I've already suggested replacing the characters (in a way), and Svish said there could be dates and a mysterious "Anything", not just numbers.

Kobi 2009-11-16 13:22:17

Woah. Will have to go through this, hehe. Some interesting things. Not sure if I quite catch what is happening with those patterns and the dictionary... but I will have a look :p(Interesting way of writing those tests too. Maybe I will rewrite mine a bit. Didn't think of using TestCase like that. What is the testIndex for? Never seen that before...)

Svish 2009-11-16 14:09:46

And, oh, yeah, that mysterious "Anything" isn't really that mysterious. But it can pretty much contain anything. Examples are numbers, dates, names, product ids, et cetera. No clue if they have a rule somewhere preventing people from entering a + into any of the names for example. But yeah... Not something I decided. I just have to work with it. It's kind of one of these "It's always been like that, so we can't change it"-things. I would love to make it more restrictive and better defined though...

Svish 2009-11-16 14:12:56

The test index is just something I use internally, just used to add them. I have a UnitTestHelper with a static Context property that I set, this allows me to set breakpoints anywhere in my class library or code and just add a condition of "UnitTestHelper.Context == 4" to only break for a particular unit test.

Lasse V. Karlsen 2009-11-16 16:57:47

As for "anything", I suggest you define the grammar first, and describe how to interpret the edge cases, *before* you write more code.

Lasse V. Karlsen 2009-11-16 16:58:25

ansaurus

tags:

views:

answers:

C#: How to split this string

related questions