ansaurus

Question

Regex to split nested coordinate strings.

Answer 1

+5 A:

You can use String#split() for this.

String string = "[(1, 2), (2, 3), (3, 4)]";
string = string.substring(1, string.length() - 1); // Get rid of braces.
String[] parts = string.split("(?<=\\))(,\\s*)(?=\\()");
for (String part : parts) {
    part = part.substring(1, part.length() - 1); // Get rid of parentheses.
    String[] coords = part.split(",\\s*");
    int x = Integer.parseInt(coords[0]);
    int y = Integer.parseInt(coords[1]);
    System.out.printf("x=%d, y=%d\n", x, y);
}

The (?<=\\)) positive lookbehind means that it must be preceded by ). The (?=\\() positive lookahead means that it must be suceeded by (. The (,\\s*) means that it must be splitted on the , and any space after that. The \\ are here just to escape regex-specific chars.

That said, the particular String is recognizeable as outcome of List#toString(). Are you sure you're doing things the right way? ;)

Update as per the comments, you can indeed also do the other way round and get rid of non-digits:

String string = "[(1, 2), (2, 3), (3, 4)]";
String[] parts = string.split("\\D.");
for (int i = 1; i < parts.length; i += 3) {
    int x = Integer.parseInt(parts[i]);
    int y = Integer.parseInt(parts[i + 1]);
    System.out.printf("x=%d, y=%d\n", x, y);
}

Here the \\D means that it must be splitted on any non-digit (the \\d stands for digit). The . after means that it should eliminate any blank matches after the digits. I must however admit that I'm not sure how to eliminate blank matches before the digits. I'm not a trained regex guru yet. Hey, Bart K, can you do it better?

After all, it's ultimately better to use a parser for this. See Huberts answer on this topic.

BalusC 2010-02-01 21:08:09

There are commas in the substrings as well... You can `string.split("),");`, and after this to bring back the `)`.

Y. Shoham 2010-02-01 21:09:06

Oops, didn't notice that .. Updated answer.

BalusC 2010-02-01 21:16:27

Well spotted! I'm trying to reproduce a list of coordinates from, ahem, a List<Coordinate> effectively.

Beau Martínez 2010-02-01 21:16:59

@Beau, and you have no reference to that List any more? It is a bit brittle to create it from the output of a `toString()` return...

Bart Kiers 2010-02-01 21:23:13

@Bart If only! I'm retreiving Strings representing a series of moves from a game via a web service. Strong typing FTW!

Beau Martínez 2010-02-01 21:26:25

:) (15 char fill)

Bart Kiers 2010-02-01 21:28:19

@Beau, I now see what you need. I added a few more lines to get the coords out.

BalusC 2010-02-01 21:34:20

Great stuff. This tempted me to mess around with Regex expressions and I came up with \([0-9], [0-9]\) to NOT include anything that has the form of coordinates. It would be nice to get it working with a negative lookaround as explained in this link:http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word

James P. 2010-02-01 21:40:52

Whau, didn't know you could do *that* with a regular expression! Guess I need fetch "Mastering Regular Expressions" from the shelf and read up on this stuff :)

Jørn Schou-Rode 2010-02-01 21:47:41

That being said, in the particular case of parsing coordinates, I would recommend the simpler/more comprehensible solution from my answer or the `Scanner` solution suggested by Hubert.

Jørn Schou-Rode 2010-02-01 21:51:48

Yes, that kind of strings are after all indeed better to be parsed/tokenized.

BalusC 2010-02-01 22:10:16

Answer 2

+1 A:

Will there alwasy be 3 groups of coordinates? You could try:

\[(\(\d,\d\)), (\(\d,\d\)), (\(\d,\d\))\]

FrustratedWithFormsDesigner 2010-02-01 21:08:58

Not necessarily! I'll edit the question; cheers on the quick reply. I'm assuming some ?*+ quantifiers will do the trick from there?

Beau Martínez 2010-02-01 21:11:22

Answer 3

+3 A:

If you do not require the expression to validate the syntax around the coordinates, this should do:

\(\d+,\s\d+\)

This expression will return several matches (three with the input from your example).

In your question, you state that you want to "retreive (1, 2), (2, 3), and (3, 4). In the case that you actually need the pair of values associated with each coordinate, you can drop the parentheses and modify the regex to do some captures:

(\d+),\s(\d+)

The Java code will look something like this:

import java.util.regex.*;

public class Test {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d+),\\s(\\d+)");
        Matcher matcher = pattern.matcher("[(1, 2), (2, 3), (3, 4)]");

        while (matcher.find()) {
            int x = Integer.parseInt(matcher.group(1));
            int y = Integer.parseInt(matcher.group(2));
            System.out.printf("x=%d, y=%d\n", x, y);
        }
    }
}

Jørn Schou-Rode 2010-02-01 21:10:46

All I get are the brackets! :/

Beau Martínez 2010-02-01 21:22:11

I have added a Java code sample showing how to use the regex. Does this fail as well?

Jørn Schou-Rode 2010-02-01 21:24:37

`Integer.parse(...)` does not work: it's `Integer.parseInt(...)`. I took the liberty to edit it and post a working example of your snippet.

Bart Kiers 2010-02-01 21:27:18

The regex returns "), ("s and "("s and ")"s; I'm using String.split(), should I use Matcher and use groups instead?

Beau Martínez 2010-02-01 21:28:00

Bart Kiers 2010-02-01 21:31:49

@Beau: The regex I have posted will match the actual coordinates, so using it with `String.split()` will give you a lot of `), (` matches. The code sample in my answer should guide you on your way, though.

Jørn Schou-Rode 2010-02-01 21:34:44

@Bart: Thanks for fixing my broken code :)

Jørn Schou-Rode 2010-02-01 21:35:28

Answer 4

A:

In regexes, you can split on (?<=\)), which use Positive Lookbehind:

string[] subs = str.replaceAll("\[","").replaceAll("\]","").split("(?<=\)),");

In simpe string functions, you can drop the [ and ] and use string.split("),"), and return the ) after it.

Y. Shoham 2010-02-01 21:13:26

Your regex produces `(1`, `2), (2`, `3), (3` and `4)` on given example?

BalusC 2010-02-01 21:19:41

Oops. I fixed from Negative to Positive. Now it should work.

Y. Shoham 2010-02-01 21:23:35

The `"(?<=\\)),\\s*"` would be nicer as it covers spaces as well. In Java regex strings you by the way need to double-escape the \.

BalusC 2010-02-01 21:25:07

Right again. :)

Y. Shoham 2010-02-01 21:26:21

Answer 5

+1 A:

If you use regex, you are going to get lousy error reporting and things will get exponentially more complicated if your requirements change (For instance, if you have to parse the sets in different square brackets into different groups).

I recommend you just write the parser by hand, it's like 10 lines of code and shouldn't be very brittle. Track everything you are doing, open parens, close parens, open braces & close braces. It's like a switch statement with 5 options (and a default), really not that bad.

For a minimal approach, open parens and open braces can be ignored, so there are really only 3 cases.

This would be the bear minimum.

// Java-like psuedocode
int valuea;
String lastValue;
tokens=new StringTokenizer(String, "[](),", true);

for(String token : tokens) {  

    // The token Before the ) is the second int of the pair, and the first should
    // already be stored
    if(token.equals(")"))
        output.addResult(valuea, lastValue.toInt());

    // The token before the comma is the first int of the pair
    else if(token.equals(",")) 
        valuea=lastValue.toInt();

    // Just store off this token and deal with it when we hit the proper delim
    else
        lastValue=token;
}

This is no better than a minimal regex based solution EXCEPT that it will be MUCH easier to maintain and enhance. (add error checking, add a stack for paren & square brace matching and checking for misplaced commas and other invalid syntax)

As an example of expandability, if you were to have to place different sets of square-bracket delimited groups into different output sets, then the addition is something as simple as:

    // When we close the square bracket, start a new output group.
    else if(token.equals("]"))
        output.startNewGroup();

And checking for parens is as easy as creating a stack of chars and pushing each [ or ( onto the stack, then when you get a ] or ), pop the stack and assert that it matches. Also, when you are done, make sure your stack.size() == 0.

Bill K 2010-02-01 21:32:01

...You might be on to something here... Any chance you could mock-up some code?

Beau Martínez 2010-02-01 21:34:00

This sounds like the event-driven approach SAX uses to parse XML. I suppose you'll need to go through the text character by character and build up a series of algorithms to detect various patterns.

James P. 2010-02-01 21:48:58

Answer 6

+7 A:

If you are ready for Java5

    Scanner sc = new Scanner(coords);
    sc.useDelimiter("\\D*"); // skip everything that is not a digit
    List<Coord> result = new ArrayList<Coord>();
    while (sc.hasNextInt()) {
        result.add(new Coord(sc.nextInt(), sc.nextInt()));
    }
    return result;

EDIT : we don't know how much coordinates are passed in the string 'coords'

Hubert 2010-02-01 21:49:29

Nice solution! And if you replace `Coord` with `java.awt.Point` it compiles as it is.

Fabian Steeg 2010-02-01 22:23:04

Watch out for negative values!

notnoop 2010-02-02 15:00:41

@notnoop : true and as strange as it seems I couldn't succeed in using a delimiter pattern like <code>"[^-0-9]*"</code>, I had to use something less trivial like <code>"[^0-9]*[(),]\\s*"</code>. I'm on Sun JDK6.

Hubert 2010-02-02 17:01:03

I love this! However as I asked for the regex I'll chose the best regex answer as the correct one for the sake of people with a similar question ;) 1 INTERNET FOR YOU

Beau Martínez 2010-02-08 00:20:50

ansaurus

tags:

views:

answers:

Regex to split nested coordinate strings.

related questions