views:

7551

answers:

7

Ideally, what I'd like to be able to do is:

cat xhtmlfile.xhtml |
getElementViaXPath --path='/html/head/title' |
sed -e 's%(^<title>|</title>$)%%g' > titleOfXHTMLPage.txt
+1  A: 

Well, you can use xpath utility. I guess perl's XML::Xpath contains it.

alamar
+2  A: 

I am not aware of any pure shell XML parsing tool. So you will most likely need a tool written in an other language.

My XML::Twig Perl module comes with such a tool: xml_grep, where you would probably write what you want as xml_grep -t '/html/head/title' xhtmlfile.xhtml > titleOfXHTMLPage.txt (the -t option gives you the result as text instead of xml)

mirod
The command I gave you would be from the command line. xml_grep itself is written in Perl, but you can call it from whatever language you want, like any other tool.
mirod
+6  A: 

Command-line tools that can be called from shell scripts include:

  • 4xpath - command-line wrapper around Python's 4Suite package
  • XMLStarlet
  • xpath - command-line wrapper around Perl's XPath library

I also use xmllint and xsltproc with little XSL transform scripts to do XML processing from the command line or in shell scripts.

Nat
A: 

Check out XML2 from http://www.ofb.net/~egnor/xml2/ which converts XML to a line-oriented format.

simon04
+4  A: 

You can do that very easily using only bash. You only have to add this function:

rdom () { local IFS=\> ; read -d \< E C ;}

Now you can use rdom like read but for html documents. When called rdom will assign the element to variable E and the content to var C.

For example, to do what you wanted to do:

while rdom; do
    if [[ $E = title ]]; then
        echo $C
        exit
    fi
done < xhtmlfile.xhtml > titleOfXHTMLPage.txt
Yuzem
A: 

Here's a function which will convert XML name-value pairs and attributes into bash variables.

http://www.humbug.in/2010/parse-simple-xml-files-using-bash-extract-name-value-pairs-and-attributes/

freethinker
A: 

After some research for translation between Linux and Windows formats of the file paths in XML files I found interesting tutorials and solutions on: