tags:

views:

1226

answers:

5

Imagine I have the folling XML file:

<a>before<b>middle</b>after</a>

I want to convert it into something like this:

<a>beforemiddleafter</a>

In other words I want to get all the child nodes of a certain node, and move them to the parent node in order. This is like doing this command: "mv ./directory/* .", but for xml nodes.

I'd like to do this in using unix command line tools. I've been trying with xmlstarlet, which is a powerful command line XML manipulator. I tried doing something like this, but it doesn't work

echo "<a>before<b>middle</b>after</a>" | xmlstarlet ed -m "//b/*" ".."

Update: XSLT templates are fine, since they can be called from the command line.

My goal here is 'remove the links from an XHTML page', in other words replace where the link was, with the contents of the link tag.

+2  A: 

In XSLT, you could just write:

<xsl:template match="a"><a><xsl:apply-templates /></a></xsl:template>

<xsl:template match="a/b"><xsl:value-of select="."/></xsl:template>

And you'd get:

<a>beforemiddleafter</a>

So if you wanted to do this the easy way you could just create an XSL stylesheet and run your XML file through that.

I realise this isn't what you said you'd like to do (using Unix command line), however. I don't know anything about Unix, so maybe someone else can fill in the blanks, eg. some sort of command line calls that can execute the above.

Rahul
+3  A: 

Example input file (test.xml):

<?xml version="1.0" encoding="UTF-8"?>
<test>
<x>before<y>middle</y>after</x>
<a>before<b>middle</b>after</a>
<a>before<b>middle</b>after</a>
<x>before<y>middle</y>after</x>
<a>before<b>middle</b>after</a>
<embedded>foo<a>before<b>middle</b>after</a>bar</embedded>
</test>

XSLT stylesheet (collapse.xsl:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

   <xsl:template match="@*|node()">
     <xsl:copy>
       <xsl:apply-templates select="@*|node()"/>
     </xsl:copy>
   </xsl:template>

   <xsl:template match="a">
     <xsl:copy>
       <xsl:value-of select="."/>
     </xsl:copy>
   </xsl:template>

 </xsl:stylesheet>

Run with XmlStarlet using

xml tr collapse.xsl test.xml

Produces:

<?xml version="1.0"?>
<test>
<x>before<y>middle</y>after</x>
<a>beforemiddleafter</a>
<a>beforemiddleafter</a>
<x>before<y>middle</y>after</x>
<a>beforemiddleafter</a>
<embedded>foo<a>beforemiddleafter</a>bar</embedded>
</test>

The first template in the stylesheet is the basic identity transformation (just copies the whole of your input XML document). The second template specifically matches the elements that you want to 'collapse' and just copies the tags and inserts the string value of the element (=concatenation of the string-value of descendant nodes).

GerG
A: 

Have you tried this?

file.xml

<r>
    <a>start<b>middle</b>end</a>
</r>

template.xsl

<xsl:template match="/">
 <a><xsl:value-of select="r/a" /></a>
</xsl:template>

output

<a>startmiddleend</a>
michal kralik
+2  A: 

If your actual goal is to remove the links from a web page, then you should use a stylesheet like this, which matches all XHTML <a> elements (I'm assuming you're using XHTML?) and simply applies templates to their content:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:h="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="h">

<!-- Don't copy the <a> elements, just process their content -->
<xsl:template match="h:a">
  <xsl:apply-templates />
</xsl:template>

<!-- identity template; copies everything by default -->
<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()" />
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

This stylesheet will deal with a situation where you have something nested within the <a> element that you want to retain, such as:

<p>Here is <a href="....">some <em>linked</em> text</a>.</p>

which you will want to come out as:

<p>Here is some <em>linked</em> text.</p>

And it will deal with the situation where you have the link nested within an unexpected element between the usual parent (the <p> element) and the <a> element, such as:

<p>Here is <em>some <a href="...">linked</a> text</em>.</p>
JeniT
This is exactly what I was looking for.Thanks for the detailed answer explaining how it all works
Rory
A: 

Using xmlstarlet:

xmlstr='<a>before<b>middle</b>after</a>'
updatestr="$(echo "$xmlstr" | xmlstarlet sel -T -t -m "/a/b" -v '../.' -n | sed -n '1{p;q;}')"
echo "$xmlstr" | xmlstarlet ed -u "/a" -v "$updatestr"
lmxy