tags:

views:

103

answers:

3

After giving up on PHP I'm trying to extract data from the xml using XSLT

How do i match the data in <jskit:attribute key="permalink"

I saw another similar question about name value pairs but im one step away from that

My xml is given here xml source

A: 

The Xpath expressions for the nodes in question are as follows:

/rss/channel/item/jskit:attribute[@key='IP']/@value

/rss/channel/item/jskit:attribute[@key='permalink']/@value

/rss/channel/item/jskit:parent-guid

The first two are using predicates to select the nodes that have a corresponding key that equals to either 'IP' or 'permalink'. The third is just a normal expression selecting a single node.

To be able to access these nodes with namespaces you need to define the namespace prefixes in your XSLT. This is done in the <xsl:stylesheet> element. For this application you can just copy the declarations from your XML source.

A full example in XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:atom="http://www.w3.org/2005/Atom" 
    xmlns:media="http://search.yahoo.com/mrss/" 
    xmlns:jskit="http://purl.org/dc/elements/1.1/"
    exclude-result-prefixes="atom media jskit dc">

    <xsl:template match="/">
        <ip>
            <xsl:value-of 
                select="/rss/channel/item/jskit:attribute[@key='IP']/@value"/>
        </ip>
        <permalink>
            <xsl:value-of 
                select="/rss/channel/item/jskit:attribute[@key='permalink']/@value"/>
        </permalink>
        <parent-guid>
            <xsl:value-of 
                select="/rss/channel/item/jskit:parent-guid"/>
        </parent-guid>
    </xsl:template>
</xsl:stylesheet>

To attach the stylesheet to the XML file use a processing instruction called xml-stylesheet

<?xml version="1.0"?>
<?xml-stylesheet href="rss.xsl" type="text/xsl" ?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:atom="http://www.w3.org/2005/Atom" 
    xmlns:media="http://search.yahoo.com/mrss/" 
    xmlns:jskit="http://purl.org/dc/elements/1.1/" >
Peter Lindqvist
@peter .. thx so much for the full exampleif i attach the xsl to the xml i get the errorerror on line 1 at column 25: Extra content at the end of the document
vk123
The xslt goes into .xsl file and the xml into .xml file. To attach the xslt to an xml you need to use something like this <?xml-stylesheet href="rss.xsl" type="text/xsl" ?>
Peter Lindqvist
thanks Peter and Roland ! Getting something working finally !
vk123
+1  A: 

First, make sure that all namespaces you want to use are available in the xslt. For example, define xmlns prefixes for all of them in the top element, like so:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:jskit="http://purl.org/dc/elements/1.1/"
>
    ... xslt templates go here...
</xsl:stylesheet>

After that, you can use the namespace prefix jskit in your XPath expressions.

<?xml version="1.0" encoding="ISO-8859-1"?> 
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:jskit="http://purl.org/dc/elements/1.1/"
>
    <!-- match all items anywhere in the document -->
    <xsl:template match="//item">
         <!-- get the "value" attribute of the "jskit:attribute" element 
              in the current item in case the "key" attribute is called "permalink"
         --> 
         <xsl:value-of select="jskit:attribute[@key='permalink']/@value"/>
         , <!-- comma separator, literal txt -->             
         <!-- get the "value" attribute of the "jskit:attribute" element 
              in the current item in case the "key" attribute is called "IP"
         --> 
         <xsl:value-of select="jskit:attribute[@key='IP']/@value"/>    
    </xsl:template>
</xsl:stylesheet>

The templates and XPath you need to write are completely dependent on the requirements of your desired output format and this is not easy to answer in full if we lack that nformation

But let's say you want to insert the data into a database, you could then generate SQL statements directly with XSLT:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:jskit="http://purl.org/dc/elements/1.1/"
>
    <!-- make text output (SQL script) -->
    <xsl:output
        method="text"
    />

    <!-- match all items anywhere in the document -->
    <xsl:template match="//item">
         INSERT INTO myTable(permalink, IP) VALUES
         ('<xsl:value-of select="jskit:attribute[@key='permalink']/@value"/>'
         ,'<xsl:value-of select="jskit:attribute[@key='IP']/@value"/>');
    </xsl:template>
</xsl:stylesheet>

However, to make this kind of thing really robust, you have to make sure that the values that end up in the SQL statement don't contain any string delimiter. If for example the permalink could contain a single quote, then this would generate invalid SQL as the single quote from the value would prematurely end the string literal SQL value.

To counter that, you can write a template that recursively processes the text values to escape those characters that need it (apart from the quote and depending on your database you may need to escape other characters too)

Anothter approach would be to use the XSLT to convert the data to a format you can easily parse in your host language. Say you are using PHP, then you ucould use XSLT to convert the XML to ini file format, parse that with parse_ini_file(), and then use PHP to properly escape/validate values and then perform the database actions.

YMMV

Roland Bouman
A: 

After giving up on PHP I' m trying to extract data from the xml using XSLT

What's to give up on? PHP has good support for XML. You can use DomXPath to query in an xml document. Use registerNamespace first, to be able to query for a namespaced attribute. Eg.:

$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace("jskit", "http://purl.org/dc/elements/1.1/");
$query = "//jskit:attribute[@key='permalink']";
foreach ($xpath->evaluate($query) as $node) {
  echo $node, "\n";
}
troelskn
That answer would be perfect in the original question http://stackoverflow.com/questions/1977072/rss-xml-namespace-confusion
Peter Lindqvist