views:

579

answers:

7

Hello, I need to find and replace the value of the specific xml element. The conditions are as follows:

  • the value of element enabled must be changed from 0 to 1;
  • enabled must be the child of an somenode element

My test xml looks like this:

<somenode name="node1">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

I expect that first and third enabled elements would be changed. So far I have managed to write this sed command:

sed -n "1h;1!H;${;g;s|\(<somenode [^>]*>\)\(.*\)\(<enabled>\s*\)0\(\s*</enabled>\)\(.*</somenode>\)|\1\2\3 1 \4\5|g;p;}" test.xml

but it changes only the last one, and I believe it is due to greedy match. Any help would be appreciated.

+3  A: 

It is generally a poor idea to try to use regexes to parse XML. See previous discussion such as http://stackoverflow.com/questions/335250/parsing-xml-with-regex-in-java. (Actually your XML is not well-formed since it does not have exactly one root element). There are many different (free) XML engines for parsing and manipulating XML in almost every language and I'd recommend you use one of those.

peter.murray.rust
Given xml is just an excerpt and I think does not change the point. More general problem would be "replace all occurrences of a given word in text, where that word is between 2 other given words".
tori3852
This is a different problem as parsing text and parsing XML are not identical. As many posters have mentioned in the page I quote your XML may change over time and there are also syntactic variants for XML (different quoting characters, whitespace, CDATA, etc.) which can complicate the problem. There are several different lexical forms for the same canonical XML.
peter.murray.rust
A: 

Forget sed for complex multi-line processing. Seriously.

If you're not willing to use a proper XML tool, at least use a standard string processing tool that has proper branching statements :-)

If you can guarantee your file is formatted in the way you have it, you can use something like:

pax> echo '<somenode name="node1">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>
' | awk '
    BEGIN {s = 0}
    /^<somenode / {s=1}
    /^<\/somenode>/ {s=0}
    /^    <enabled>0<\/enabled>/ {if (s==1) {$0="    <enabled>1</enabled>"}}
    {print}
'

to get:

<somenode name="node1">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

The trouble with that sort of method is that it doesn't handle what may be perfectly valid XML files. This particular version has certain limitations such as:

  • the somenode start and end tags must be at the start of the line.
  • the enabled tag must be preceded by four spaces. You could work around these to make it a bit more flexible but, by the time you've written your script to handle any valid XML input, it'll be such a monstrosity that it would have been quicker to use an XML transformation tool.

That's why it's better to use a tool built specifically for the job. But, if you just want a quick hack and the file format is under your control, it's probably okay to use the awk (or perl or pthyon or your other quick-and-dirty scripting tool of choice).

paxdiablo
A: 

you can use gawk

awk -vRS= '/somenode/{ 
    $0=gensub("(.*<enabled>)([01])(</enabled>.*)", "\\11\\3","g",$0) 
}1'  file

output

$ ./shell.sh
<somenode name="node1">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>
<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>
<somenode name="node3">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>
ghostdog74
A: 

You seems need to loop something with sed

http://www.rtfiber.com.tw/~changyj/sed/html/p.20070613a.html

I still can't figure out though, just for your information.

S.Mark
A: 

your requirement is quite simple as seen from your description, therefore there's no need to use XML parsers/tools, if you don't want to. you can use just the shell(or other shell tools you may prefer)

#!/bin/bash
while read -r line
do 
    case "$line" in
        *"<someothernode"* ) flag=0;;
        *"<somenode"* )flag=1;;
    esac
    if [ "$flag" -eq "1" ] ;then
        case "$line" in
            *"<enabled"* ) 
                echo "${line/<enabled>0/<enabled>1}"
                ;;
            *) echo $line;
        esac
    else
        echo $line
    fi    
done < "file"
A: 

Other people have already explained why it is generally not a good idea to process XML with regular expressions.

With all that in mind, here's the sed program to substitute text matching foo with bar between lines matching start and end (inclusively):

/start/,/end/s/foo/bar/
Jukka Matilainen
A: 

Use xmlstarlet if possible:

echo '
<root>
<somenode name="node1">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</somenode>

<someothernode name="node2">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</someothernode>

<somenode name="node3">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</somenode>
</root>
' > testfile.xml


xml val testfile.xml
xml el -v testfile.xml

xml ed --help

# version 1
xml ed -u "//somenode[1]/enabled" -v '1' \
       -u "//somenode[2]/enabled" -v '1' \
       testfile.xml  

# version 2  (-L for in-place editing; xmlstarlet v1.0.2)
xml ed -L -u "//somenode[@name='node1']/enabled" -v '1' \
          -u "//somenode[@name='node3']/enabled" -v '1' \
          testfile.xml  
yabt