tags:

views:

87

answers:

1

I am trying to strip and replace a text string that looks as follows in the most elegant way possible:

element {"item"} {text { 
          } {$i/child::itemno} 

To look like:

<item> {$i/child::itemno} 

Hence removing the element text substituting its braces and removing text and its accompanying braces.

I believe the appropriate regex to do this is:

/element\s*\{"([^"]+)"\}\s*{text\s*{\s*}\s*({[^}]*})/

but I am unsure as to the number of backslashes to use in java and also how to complete the final substitution which makes use of my group(1) and replaces it with < at its start and > at its end:

So far I have this (although perhaps I might be better off with a full rewrite ?)

 Pattern p = Pattern.compile("/element\\s*\\{\"([^\"]+)\"\\}\\s*{text\\s*{\\s*}\\s*({[^}]*})/ "); 
             // Split input with the pattern 
        Matcher m = p.matcher("element {\"item\"} {text {\n" + 
                "          } {$i/child::itemno} text { \n" + 
                "            } {$i/child::description} text {\n" + 
                "            } element {\"high_bid\"} {{max($b/child::bid)}}  text {\n" + 
                "        }}  "); 

// Next for each instance of group 1, replace it with < > at the start


I think I've stumbled across a problem. What I am trying to do is somewhat harder than I previously stated. With the solution I have below:


element {"item"} {text { } {$i/child::itemno} text { } {$i/child::description} text { } element {"high_bid"} {{max($b/child::bid)}} text { }}
GIVES:

<item> {$i/child::itemno} text { } {$i/child::description} text { } element {"high_bid"} {{max($b/child::bid)}} text { }}

When I expected:

<item>{$i/child::itemno}{$i/child::description}<high_bid>{fn:max($b/child::bid)}</high_bid></item>
+2  A: 
  1. Java regex-es are written without delimiters. So loose the forward slashes;
  2. every single backslash needs one extra, so \s becomes \\s;
  3. all { need to be escaped: \\{, and } need no escape (although it doesn't hurt if you do escape them).

Try:

String text = "element {\"item\"} {text { } {$i/child::itemno}";
System.out.println(text.replaceAll("element\\s*\\{\"([^\"]+)\"}\\s*\\{text\\s*\\{\\s*}\\s*(\\{[^}]*})", "<$1> $2"));

Output:

<item> {$i/child::itemno} 
Bart Kiers
Thanks ! that works a treat (and relatively painless I think !)
Pablo
You're welcome. The painfulness depends on how proficient you are with regex. Someone unfamiliar with regex is likely to disagree with you. :)
Bart Kiers