tags:

views:

138

answers:

2

I am trying to strip and replace a text string that looks as follows in the most elegant way possible:

element {"item"} {text {
          } {$i/child::itemno}

To look like:

<item> {$i/child::itemno}

Hence removing the element text substituting its braces and removing text and its accompanying braces. These patterns may be ancountered several times. Am I better off using Java's java.util.regex.Pattern OR the simple replaceAll OR org.apache.commons.lang.StringUtils ?

Thanks for the reponses:

I now have the following but I am unsure as to the number of backslashes and also how to complete the final substitution which makes use of my group(1) and replaces it with < at its start and > at its end:

 Pattern p = Pattern.compile("/element\\s*\\{\"([^\"]+)\"\\}\\s*{text\\s*{\\s*}\\s*({[^}]*})/ ");
             // Split input with the pattern
        Matcher m = p.matcher("element {\"item\"} {text {\n" +
                "          } {$i/child::itemno} text { \n" +
                "            } {$i/child::description} text {\n" +
                "            } element {\"high_bid\"} {{max($b/child::bid)}}  text {\n" +
                "        }}  ");

            // For each instance of group 1, replace it with < > at the start and end
A: 

I think a simple string replacement will do. Here is a Python version (can be turned into a one-liner):

>>> a = """element {"item"} {text {
          } {$i/child::itemno}"""
>>> 
>>> a
'element {"item"} {text {\n          } {$i/child::itemno}'
>>> a=a.replace(' ', '').replace('\n', '')
>>> a
'element{"item"}{text{}{$i/child::itemno}'
>>> a = a.replace('element {"', '<')
>>> a
'element{"item"}{text{}{$i/child::itemno}'
>>> a = a.replace('element{"', '<')
>>> a
'<item"}{text{}{$i/child::itemno}'
>>> a = a.replace('"}{text{}', '> ')
>>> a
'<item> {$i/child::itemno}'
>>> 
Hamish Grubijan
Sorry, I am new to regex, how can this be combined into a single line ?
Pablo
+1  A: 

Find:

/element\s*\{"([^"]+)"\}\s*{text\s*{\s*}\s*({[^}]*})/

Replace:

"<$1> $2"
nickf
Thanks for your response, any idea how this could translate to Java ? Particularly the identification of <$1>
Pablo
@pablo: parenthesis. `([^"]+)` and `({[^}]*})`
fireeyedboy
Thanks, how do I carry out a replace of <$1> consdering it needs to provide angle brackets on each side of the first group ?
Pablo
in the replace, `$1` is replaced with whatever is inside the first (bracketed group): in this case it will be the word "item"
nickf
What if I don't know to expect the word "item" as this could be any word. Doesn't this mean I need to isolate what lies after element {" and put it in a variable before I can replace it ? If so how do I keep track of its position ?Any chance of some Java code please ?
Pablo
@Pablo this doesn't specifically match "item" (do you see that written anywhere in my code?) - it matches anything in the quotes after the word "element". this is called a regular expression and you can read up more about them at http://regular-expressions.info
nickf