views:

71

answers:

2

I wonder if anybode knows what command or bash-script-code I can use to print out all the values of the title attributes in all my xml files (in current directory).

I'm using cygwin and have file names containing white spaces.

( I've been googling around and there are a lot of suggestions on downloading other utilities. If I can avoid that it would be good for me. For example I installed sgrep and then got this error: sh: m4: command not found system("m4 -s") returned non zero exit status (32512). Preprocessor returned empty file )

If there is an Xpath program that is free to download to windows and use like a stand alone search program, that would be great too =)

Thanks in advance for helping out / T

A: 

Do you have xml_grep installed? It is free and came standard on my install of centOS here. It can take an xpath expression and print the results.

frankc
Thanks for answering. Well is there a xml_grep for Cygwin? I've been googling for that but cannot find it. / T
Tony
A: 

If the tag and the title attribute are all on the same line, but there are line feeds between the different instances of your tag, the following could work for you. For example

<mytag someAttr="blah" Title="The Title goes here" ...

Then you could do something like the following in order to find the tags of interest that contain a Title attribute:

grep -ro '<mytag[ \t].*Title="[^"]*"' /path/to/directory/to/search

Alternatively, you should be able to use find and xargs:

find /your/search/path -iname '*.xml' -print0 | \
    xargs -0 -r grep -ro '<mytag[ \t].*Title="[^"]*"'

Now that you know you have the correct tag and its corresponding Title attribute, you just want the Title attribute, so you can use grep's -o option to output only the data matching the regular expression followed by cut to extract the value of the Title:

grep -ro '<mytag[ \t].*Title="\([^"]*\)"' /path/to/directory/to/search | \
    grep -o 'Title="[^"]*"' | cut -f2 -d'"'
Kaleb Pederson
Well thank you for taking your time but when running the command you suggest I got a whole novel in return. So unfortunately it's not working. Could it be because there are no row breaks in my files?Best regards / T
Tony
If there are no line feeds between `mytags`, then this will **not** work as expected. `xmllint --format` could be used to format the document if it's not whitespace critical. I've updated my post to clarify and better match your question.
Kaleb Pederson
Aha, logged in automatically only in one of my browsers.Kaleb, thanks a lot. You grep command has helped me. Actually this part of it was enough for my needs: grep -o 'Title="[^"]*"' *.xml | cut -f2 -d'"'
Tony