tags:

views:

268

answers:

6

I've a XML file with the contents:

<?xml version="1.0" encoding="utf-8"?>
<job xmlns="http://www.sample.com/"&gt;programming&lt;/job&gt;

I need a way to extract what is in the <job..> </job> tags, programmin in this case. This should be done on linux command prompt, using grep/sed/awk.

A: 

How about:

cat a.xml | grep '<job' | cut -d '>' -f 2 | cut -d '<' -f 1
codaddict
UUOC. `grep '<job' a.xml | ...`
ghostdog74
@ghost *but but but, I think it's cleaner / nicer / not that much of a waste / my privelege to waste processes!* http://partmaps.org/era/unix/award.html#cat (actually, I think it's easier to edit the filename, because nearer the start)
13ren
+9  A: 

Do you really have to use only those tools? They're not designed for XML processing, and although it's possible to get something that works OK most of the time, it will fail on edge cases, like encoding, line breaks, etc.

I recommend xml_grep:

xml_grep 'job' jobs.xml --text_only

Which gives the output:

programming

On ubuntu/debian, xml_grep is in the xml-twig-tools package.

amarillion
+2  A: 
 grep '<job' file_name | cut -f2 -d">"|cut -f1 -d"<"
Vijay Sarathi
only that it fails if tags are on separate lines
ghostdog74
There are about a dozen other ways that well-formed XML can make that fail.
Robert Rossney
+1  A: 

just use awk, no need other external tools. Below works if your desired tags appears in multitine.

$ cat file
test
<job xmlns="http://www.sample.com/"&gt;programming&lt;/job&gt;
<job xmlns="http://www.sample.com/"&gt;
programming</job>

$ awk -vRS="</job>" '{gsub(/.*<job.*>/,"");print}' file
programming

programming
ghostdog74
+1  A: 

Assuming same line, input from stdin:

sed -ne '/<\/job>/ { s/<[^>]*>\(.*\)<\/job>/\1/; p }'

notes: -n stops it outputting everything automatically; -e means it's a one-liner (aot a script) /<\/job> acts like a grep; s strips the opentag + attributes and endtag; ; is a new statement; p prints; {} makes the grep apply to both statements, as one.

13ren
A: 

Using xmlstarlet:

echo '<job xmlns="http://www.sample.com/"&gt;programming&lt;/job&gt;' | \
   xmlstarlet sel -N var="http://www.sample.com/" -t -m "//var:job" -v '.'
lmxy