views:

378

answers:

1

I want to use the wikipedia API to find the French pages including the ''Template:Infobox Scientifique'' missing in the english version. So, my idea was to process the following document with xproc:

http://fr.wikipedia.org/w/api.php?action=query&format=xml&list=embeddedin&eititle=Template:Infobox%20Scientifique&eilimit=400

and the following xslt stylesheet:

<?xml version='1.0' ?>
<xsl:stylesheet
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >
<xsl:output method='text' indent="yes"/> 
<xsl:template match="/">
<xsl:apply-templates select="api"/>
</xsl:template>

<xsl:template match="api">
<xsl:for-each select="query/embeddedin/ei">
<xsl:variable name="title" select="translate(@title,&apos; &apos;,&apos;_&apos;)"/>
<xsl:variable name="english-title">
<xsl:call-template name="englishTitle"><xsl:with-param name="title" select="@title"/></xsl:call-template>
</xsl:variable>

<xsl:value-of select="$english-title"/><xsl:text>
</xsl:text>

</xsl:for-each>
</xsl:template>

<xsl:template name="englishTitle">
<xsl:param name="title"/>
<xsl:variable name="uri1" select="concat(&apos;http://fr.wikipedia.org/w/api.php?action=query&amp;amp;format=xml&amp;amp;prop=langlinks&amp;amp;lllimit=500&amp;amp;titles=&amp;apos;,translate($title,&amp;apos; &apos;,&apos;_&apos;))"/>
<xsl:message><xsl:value-of select="$uri1"/></xsl:message>
<xsl:message>count=<xsl:value-of select="count(document($uri1,/api/query/pages/page/langlinks/ll))"/></xsl:message>
</xsl:template>

</xsl:stylesheet>

The XSLT extract all the articles containing the Template and for each article I wanted to call wikipedia to get the links between the wikis. Here, the template englishTitle calls the xpath function document() .

But it always says that count(ll)=1 whereas there are plenty nodes. (e.g. http://fr.wikipedia.org/w/api.php?action=query&amp;format=xml&amp;prop=langlinks&amp;lllimit=500&amp;titles=Carl_Sagan ).

Can't I process the nodes returned by the document() function ?

+1  A: 

You should try:

<xsl:value-of select="count(document($uri1)/api/query/pages/page/langlinks/ll)"/>

On a different note - what is

translate(@title,&apos; &apos;,&apos;_&apos;)

supposed to mean? What's wrong with:

translate(@title, ' ', '_')

There is no need to encode single quotes in XML attributes unless you want to use a type of quote that delimits the attribute value. All of these are valid:

name="foo&quot;'foo"
name='foo&apos;"foo'

Your entire transformation can be reduced to something like this:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="text" /> 

  <xsl:param name="baseUrl" select="'http://fr.wikipedia.org/w/api.php?action=query&amp;amp;format=xml&amp;amp;prop=langlinks&amp;amp;lllimit=500&amp;amp;titles='" />

  <xsl:template match="ei">
    <xsl:variable name="uri" select="concat($baseUrl ,translate(@title,' ','_'))"/>
    <xsl:variable name="doc" select="document($uri)"/>

    <xsl:value-of select="$uri"/>
    <xsl:text>&#10;</xsl:text>

    <xsl:text>count=</xsl:text>
    <xsl:value-of select="count($doc/api/query/pages/page/langlinks/ll)"/>
    <xsl:text>&#10;</xsl:text>
  </xsl:template>

  <xsl:template match="text()" />  
</xsl:stylesheet>

Let the XSLT default templates work for you - they do all of the recursion in the background, all you have to do is catch the nodes you want to process (and prevent output of unnecessary text by overriding the default text() template with an empty one).

Tomalak
Thanks, it worked :-)
Pierre