views:

1842

answers:

3
<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
<!--#include file="conn.asp"-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml"&gt;
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Is the above code right?

+2  A: 

Yes.

UTF-8 is CP65001 in Windows (which is just a way of specifying UTF-8 in the legacy codepage stuff). As far as I read ASP can handle UTF-8 when specified that way.

Joey
In what way is Codepage "legacy"?
AnthonyWJones
Historically texts had a *code page* which simply specified which character set to use. Those had some number which differed from vendor to vendor, Windows seems to use a 16-bit unsigned integer for that purpose. Nowadays most encodings and character sets have *names* instead of *numbers*. I consider the fact that UTF-8 has a code page number (that is nowhere specified nor used outside Microsoft) a thing to ensure that it's still working with the old 16-bit integer code page number system. Even though UTF-8 is nothing like a code page in the first place.
Joey
@Johannes: The codepage number is still an important feature of how Windows handles character encoding. For example in .NET the Encoding class can only be instanced using the codepage number. I don't think Codepage is yet "legacy".
AnthonyWJones
It's only there for correct interoperability with previous and existing systems. Nowadays I guess such mechanisms would use names instead of arbitrary numbers simply because the encoding landscape has changed a bit since ye olde days of 1980.
Joey
A: 

Yes, 65001 is the Windows code page identifier for UTF-8, as documented on the Microsoft website. Wikipedia suggests that IBM code page 128 and SAP code page 4110 are also indicators for UTF-8.

Tim
+2  A: 

Your code is correct although I prefer to set the CharSet in code rather than use the meta tag:-

<% Response.CharSet = "UTF-8" %>

The codepage 65001 does refer to the UTF-8 character set. You would need be make sure that your asp page (and any includes) are saved as UTF-8 if they contain any characters outside of the standard ASCII character set.

By specifying the CODEPAGE attribute in the <%@ block you are indicating that anything written using Response.Write should be encoded to the Codepage specified, in this case 65001 (utf-8). Its worth bearing in mind that this does not affect any static content which is sent verbatim byte for byte to the response. Hence the reason why the file needs be actually saved using the codepage that is specified.

The CharSet property of the response sets the CharSet value of the Content-Type header. This has no impact on how the content my be encoded it merely tells the client what encoding is being received. Again it is important that his value match the actual encoding sent.

AnthonyWJones