tags:

views:

45

answers:

1

I have a table like:

A | 1  
A | 2  
B | 1  
B | 2  
B | 3

I'm trying to transform it to look like this:

A { 1 | 2 }  
B { 1 | 2 | 3 }

I've come up with this which will match correctly I just can't figure out how to get the repeated capture out.

(A|B)|(\d)(\r\n\1|(\d))*

UPDATE

I realize that this would be fairly trivial with some programming language, I was hoping to learn something more about regular expressions.

+1  A: 

This is a Java code that perhaps may be helpful:

    String text =   "A | 1\n" +
                    "A | 2\n" +  
                    "B | 1\n" +
                    "B | 2\n" +
                    "B | 3\n" +
                    "A | x\n" +
                    "D | y\n" +
                    "D | z\n";
    String[] sections = text.split("(?<=(.) . .)\n(?!\\1)");
    StringBuilder sb = new StringBuilder();
    for (String section : sections) {
        sb.append(section.substring(0, 1) + " {")
          .append(section.substring(3).replaceAll("\n.", ""))
          .append(" }\n");
    }
    System.out.println(sb.toString());

This prints:

A { 1 | 2 }
B { 1 | 2 | 3 }
A { x }
D { y | z }

The idea is to to do this in two steps:

  • First, split into sections
  • Then transform each section

A single replaceAll variant

If you intersperse { and } in the input to be captured so they can be rearranged in the output, this is possible with a single replaceAll (i.e. an entirely regex solution)

String text =   "{ A | 1 }" +
                "{ A | 2 }" +
                "{ B | 1 }" + 
                "{ B | 2 }" +
                "{ B | 3 }" +
                "{ C | 4 }" +
                "{ D | 5 }";
System.out.println(
    text.replaceAll("(?=\\{ (.))(?<!(?=\\1).{7})(\\{)( )(.) .|(?=\\}. (.))(?:(?<=(?=\\5).{6}).{5}|(?<=(.))(.))", "$4$3$2$7$6")
);

This prints (see output on ideone.org):

A { 1 | 2 } B { 1 | 2 | 3 } C { 4 } D { 5 }

Unfortunately no, I don't think this is worth explaining. It's way too complicated for what's being accomplished. Essentially, though, lots of assertions, nested assertions, and capture groups (some of which will be empty strings depending on which assertion passes).

This is, without a doubt, the most complicated regex I've written.

polygenelubricants
My head a splode!
Alan Moore
@Alan: check out the latest revision. My own head exploded.
polygenelubricants
Cool, I'll have to spend a few minutes deciphering the regex, that should feed my 'want to learn something' bug. Ultimately as you have concluded as well, this is a job that should be with code not regex. Thanks!
Zac