views:

1335

answers:

4

Hi everyone. Well i am new to linux shell and i can't understand any regexp :(

Here is my question: I have a directory called /var/visitors and under this directory, i have directories like a, b, c, d. In each of these directories, there is a file called list.xml and here is the content of list.xml belonging to /var/visitors/a directory:

<key>Name</key>
<string>Mr Jones</string>
<key>ID</key>
<string>51</string>
<key>Len</key>
<string>53151334</string>

what i want to do is to merge Name field with its corresponding string and merge ID field with its corresponding string.. I don't need other fields..

Name: Mr Jones
ID: 51
---
Name: Ms Maggie
ID: 502

Here is what i can write:

cd /var/visitors
find -name "list.xml" | xargs grep ?????

Well please help me :(( I am really stuck I need to write this asap :( Thanks for your attention.

A: 

Grep is not going to help you here, you are going to need to use something like sed or awk.

hhafez
well my friend told me to use one of them. although i read the man pages, i couldn't come up with a solution :(
GuleLim
because it can't do it, I hope someone else can show you how to do it with sed or awk cause I'm busy now, if I get a break and no one has answered, I'll show you how to do it.
hhafez
okay! thanks for your attention, hhafez.
GuleLim
A: 

This is real dirty, but if you're sure they're in the format they're in, you could throw some perl together to parse it... something like

for (<STDIN>) {
  if (/<key>([^<]*)</) { print $1 . " : "; }
  if (/<string>([^<]*)</) { print $1 . "\n"; }
}

that may not be perfect, but close to accomplishing what you're looking for. I'm sure there is probably some perl module that will parse XML for you, too, but for such a non-complex schema, I think you'll be ok without it.

Lazy Bob
how can i modify this in such a way that it does not take attributes other than name and id into consideration??
GuleLim
the xml document seems to be pretty unstructured :) Try changing it to <entry><key>NAME</key><value>Mr Jones</value></entry> for example. That way it's much easier to process it
Johannes Schaub - litb
I agree, the XML is badly structured. If you are able to change it, you might as well do it properly: <entry><id>51</id><name>Mr. Jones</name></entry>
Lars Haugseth
A: 

Assuming you have the file foo.bar containing the following text:

<key>Name</key>
<string>Mr Jones</string>
<key>ID</key>
<string>51</string>
<key>Len</key>
<string>53151334</string>

something like this will work:

$ awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}' foo.bar
Name: Mr Jones
ID: 51
Len: 53151334

If it's not entirely what you're wanting, shoe-horn it further to meet your specific requirements.

+2  A: 

Not elegant, but this will work:

find -name "list.xml" | xargs cat | tr -d "\n" | sed 's/<\/string>/\n/g' | sed 's/<\/key>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|ID:" | sed 's/Name: /---\nName: /g'

Basically it does this:

  • remove all newlines
  • put each key value pair on its own line
  • add : separator
  • remove all element content (between < and >)
  • only save Name and ID fields (drop all others)
  • add --- separator

Sample Output:

---
Name: Greg
ID: 52
---
Name: Amy
ID: 53
---
Name: Mr Jones
ID: 51
Greg Bender