views:

81

answers:

4

Hi,

I have to read in a file that contains a number of coordinates. The file is structured in the following way:

X1/Y1,X2/Y2,X3/Y3,X4/Y4

Where X and Y are positive integers. To solve this problem I want to use a regex (I think this is in general a good idea because of minimal refactoring when the pattern changes).

Therefore I have developed the following regex:

Regex r = new Regex(@^(?<Coor>(?<X>[0-9]+)/(?<Y>[0-9]+))(,(?<Coor>(?<X>[0-9]+)/(?<Y>[0-9]+)))*$");

However when I test this regex on data, for example:

1302/1425,1917/2010

The Regex only seems to recall the last X, Y and Coor group. In this case Coor is "12/17", X is "1917" and Y is "2010". Is there a way to generate some sort of tree. So I find an object who gives me all the Coor expressions, with under each Coor an X and Y component?

If possible, I would like to use only one Regex, this because the format could perhaps change to another one.

+5  A: 

You can quite easily solve this without any regular expression by using string.Split and int.Parse:

var coords = s.Split(',')
    .Select(x => x.Split('/'))
    .Select(a => new {
        X = int.Parse(a[0]),
        Y = int.Parse(a[1])
    });

If you want to use a regular expression to validate the string you could do it like this:

"^(?!,)(?:(?:^|,)[0-9]+/[0-9]+)*$"

If you want to use a regular expression based approach also for extracting the data you could first validate the string using the above regular expression and then extra the data as follows:

var coords = Regex.Matches(s, "([0-9]+)/([0-9]+)")
    .Cast<Match>()
    .Select(match => new
    {
        X = int.Parse(match.Groups[1].Value),
        Y = int.Parse(match.Groups[2].Value)
    });

If you really want to perform the validation and data extraction simultaneously with a single regular expression you can use two capturing groups and find the results in the Captures property for each group. Here's one way you could perform both the validation and data extraction using a single regular expression:

List<Group> groups =
    Regex.Matches(s, "^(?!,)(?:(?:^|,)([0-9]+)/([0-9]+))*$")
         .Cast<Match>().First()
         .Groups.Cast<Group>().Skip(1)
         .ToList();

var coords = Enumerable.Range(0, groups[0].Captures.Count)
    .Select(i => new
    {
        X = int.Parse(groups[0].Captures[i]),
        Y = int.Parse(groups[1].Captures[i])
    });

However you may want to consider whether the complexity of this solution is worth it compared to the string.Split based solution.

Mark Byers
+1 for being so comprehensive
Gabe Moothart
+2  A: 

You might get what you seek if you use the "Matches" rather than "Match" command. Also, can't you shorten the regex perhaps to this:

Regex(@"((?<Coor>(?<X>[0-9]+)/(?<Y>[0-9]+))|,)*");
Brent Arias
+1  A: 

I think your first problem is that your regex is flawed, the anchors are throwing off the matching. This is the one I came up with: (just the regex shown here, no code)

(?<Coor>(?<X>[0-9]+)/(?<Y>[0-9]+))

The one Mystagogue works as well, but produces 'blank' matches on the commas (for me).

Coding Gorilla
+3  A: 

There is no reason to use a regular expression for such a simple format.

Just split the string and use plain string operations to get the coordinates:

var coordinates =
  fileContent.Split(',').Select(s => {
    int pos = s.IndexOf("/");
    return new {
      X = s.Substring(0, pos),
      Y = s.Substring(pos + 1)
    };
  });

If the file format gets much more complicated you can refactor it into using a regular expression. Until then, simple code like this is much easier to maintain.

Guffa