tags:

views:

91

answers:

2

Hi, I want to replace the value of href tags in the HTML using XSLT. For example: if the anchor tag is <a href="/dir/file1.htm" />, I want to replace the href value like this: <a href="http://site/dir/file1.htm" />. The point is I want to replace all the relative urls with the absolute values.

I want to do this for all the anchor tags in the HTML content. How can I do this using XSLT?

Thanks.

EDIT: This is for Google Appliance. I display the results in a frame and the links doesn't work in the Cached page. It takes the address bar URL as the root. Here the HTML is in the form of a string, and it displays the HTML based on a condition. Can someone suggest a way to replace all the href tags in the string?

+1  A: 

This XSLT 1.0 transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="pServerName" select="'http://MyServer'"/&gt;

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="a/@href[not(starts-with(.,'http://'))]"&gt;
  <xsl:attribute name="href">
   <xsl:value-of select="concat($pServerName, .)"/>
  </xsl:attribute>
 </xsl:template>
</xsl:stylesheet>

when applied to this XML document:

<html>
 <a href="/dir/file1.htm">Link 1</a>
 <a href="/dir/file2.htm">Link 2</a>
 <a href="/dir/file3.htm">Link 3</a>
</html>

produces the wanted, correct result:

<html>
    <a href="http://MyServer/dir/file1.htm"&gt;Link 1</a>
    <a href="http://MyServer/dir/file2.htm"&gt;Link 2</a>
    <a href="http://MyServer/dir/file3.htm"&gt;Link 3</a>
</html>

II. XSLT 2.0 solution:

In XPath 2.0 one can use the standard function resolve-uri()

This transformation:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

 <xsl:variable name="vBaseUri" select="'http://Myserver/ttt/x.xsl'"/&gt;

 <xsl:template match="/">
  <xsl:value-of select="resolve-uri('/mysite.aspx', $vBaseUri)"/>
 </xsl:template>
</xsl:stylesheet>

when applied on any XML document (not used), produces the wanted, correct result:

http://Myserver/mysite.aspx

If the stylesheet module comes from the same server as the relative URLs to be resolved, then there is no need to pass the base uri in a parameter -- doing the following produces the wanted result:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

 <xsl:variable name="vBaseUri">
  <xsl:for-each select="document('')">
   <xsl:sequence select="resolve-uri('')"/>
  </xsl:for-each>
 </xsl:variable>


 <xsl:template match="/">
  <xsl:value-of select="resolve-uri('/mysite.aspx', $vBaseUri)"/>
 </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
Hi, Thanks for the answer. I just want to know how to apply this template if the HTML is in the form of a string. I have different HTML and apply that HTML based on the condition. For example, I display normal page when it's a regular search and display cached HTML when user requests the cached page. BTW, this is for Google Appliance.
Sridhar
@Sridhar You have to read the documentation of the particular XSLT processor you are using -- this is different for the different XSLT processors.
Dimitre Novatchev
A: 

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:param name="pDirectoryPath" select="'http://site.org/dir'"/&gt;
    <xsl:variable name="vSitePath" select="concat(
                                            substring-before(
                                                   $pDirectoryPath,
                                                   '//'),
                                            '//',
                                            substring-before(
                                                 substring-after(
                                                       $pDirectoryPath,
                                                       '//'),
                                                 '/'))"/>
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="a/@href[starts-with(.,'/')]" priority="1">
        <xsl:attribute name="href">
            <xsl:value-of select="concat($vSitePath,.)"/>
        </xsl:attribute>
    </xsl:template>
    <xsl:template match="a/@href[not(contains(.,'://'))]">
        <xsl:attribute name="href">
            <xsl:value-of select="concat($pDirectoryPath,'/',.)"/>
        </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

With this input:

<html>
    <body>
        <h4>Headline</h4>
        <p>Root relative link <a href="/image/image1.jpg" /></p>
        <p>Relative link <a href="next.htm" /></p>
        <p>Absolute Link <a href="http://site.org/dir/file1.htm" /></p>
    </body>
</html>

Output:

<html>
    <body>
       <h4>Headline</h4>
       <p>Root relative link <a href="http://site.org/image/image1.jpg"&gt;&lt;/a&gt;&lt;/p&gt;
       <p>Relative link <a href="http://site.org/dir/next.htm"&gt;&lt;/a&gt;&lt;/p&gt;
       <p>Absolute Link <a href="http://site.org/dir/file1.htm"&gt;&lt;/a&gt;&lt;/p&gt;
    </body>
</html>

Edit: Example of root relative path an real relative path.

Alejandro