views:

507

answers:

2

I have the following XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<example>
    <contactInfo>
        <id>12319221</id>
        <name>Jerry P</name>
        <market>
            <name>Test</name>
            <phone>800.555.1010</phone>
        </market>
        <agent>
            <name>Test User</name>
            <email>[email protected]</email>
        </agent>
        <summary>&amp;#8220;Jerry just gets it!&amp;#8221;</summary>
    </contactInfo>
</example>

I am encoding special characters as html entities when I save this xml document, hence how the smart quotes are encoded as &#8220; and &#8221.

And I use an XSL, via Java/Xalan, to transform the xml document to html:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
<xsl:param name="wsHost"></xsl:param>
<xsl:param name="serverId"></xsl:param>

<xsl:template match="/showcase">
    <html xmlns="http://www.w3.org/1999/xhtml"&gt;
        <head>
            <title>Example</title>
        </head>
        <body>
            <div id="profile">
                <xsl:apply-templates/>
            </div>
        </body>
    </html>
</xsl:template>

<!-- Contact Info section -->
<xsl:template match="/example/contactInfo">
    <span class="sectionTitle">Contact Info:</span>
    <div id="contactInfo">
        <xsl:if test="name">
            <strong>Candidate Name:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="name" /><br />
        </xsl:if>

        <xsl:if test="id">
            <strong>Candidate ID:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="id" /><br />
        </xsl:if>

        <xsl:if test="market">
            <xsl:if test="market/name">
                <strong>Market Name:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="market/name" /><br />
            </xsl:if>

            <xsl:if test="market/phone">
                <strong>Market Phone:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="market/phone" /><br />
            </xsl:if>
        </xsl:if>

        <xsl:if test="agent">
            <xsl:if test="agent/name">
                <strong>Agent Name:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="agent/name" /><br />
            </xsl:if>

            <xsl:if test="agent/email">
                <strong>Agent Email:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="agent/email" /><br />
            </xsl:if>
        </xsl:if>

        <xsl:if test="summary">
                <strong>Summary:</strong>&#160;<xsl:value-of disable-output-escaping="yes" select="summary" /><br />
        </xsl:if>
    </div>
    <hr size="1" noshade="noshade" class="rule" />
</xsl:template>
</xsl:stylesheet>

The html that results from the transform is then written to the browser. Here is where I'm noticing a character encoding issue. The   (nbsp numeric value) show up as either black diamond question marks (firefox) or a box character (ie) and so do the entities that were previously encoded (“ / ”).

Also, maybe the biggest hint of all is that when transforming this xml file on a linux platform (then writing html to firefox) everything appears correctly. It's only when the transform is done from windows do the character encoding issues occur (in both firefox and ie).

Am I encoding the entities incorrectly or maybe not specify a character set somewhere?

A: 

Well you havent set the encodeing in the HTML document for one. Dont know if thats the issue but that would be my first attempt to fix.

try adding:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

to your head.

prodigitalson
Ok so I've added:<xsl:output method="html" standalone="yes" version="1.0" encoding="UTF-8" indent="yes"/>and now the resulting html received a meta tag with the content type:<META http-equiv="Content-Type" content="text/html; charset=UTF-8">I want to be able to render html tags that are contained within the original xml (html may not be well-formed) which would require me to not escape < to <.But I also need to escape — to — (long hyphen character) otherwise they show up as blocks from windows (both firefox/ie).Should I escape some and not others?
tonysbd
Honestly i onyl posted an answer becuase not having an encoding attached to the HTML jumped otu at me... I've used XSL/XSLT very little so im not sure i can answer your comment accurately. But im wondering how it might change things if you added the html NS to certain elements within the xml file... Would that help your transformation in any way? or is that a no go because of possible malformedness? Also would it be better to wrap the content of elements containg html in CDATA sections?
prodigitalson
+2  A: 

You say you are using Java/Xalan. Are you prividing the output stream or stream writer? If so you need to explicitly set the encoding at that point:

... new OutputStreamWriter(stream,"UTF-8");

Just including the UTF8 headers does not actually cause the output file to be UTF8 encoded.

Jim Garrison
Agreed. If you examine the HTML output from the Windows box in a hex viewer, you'll probably see the smart quotes as `93` and `94` and the NBSP as `A0` - their windows-1252 encodings. It works as you expect it to on your Linux box because UTF-8 happens to be the default encoding for that platform.
Alan Moore