+2  A: 

I'm gonna guess that the editor you are using doesn't work with UTF-8, and is converting everything to ASCII.

The simple answer is to stop using special characters in HTML pages. The copyright symbol should be written as © or ©.

James Curran
Pardon the quibble, but it can't be reading the text as ASCII because ASCII doesn't support accented letters or the copyright symbol. It has to be using an eight-bit encoding like ISO-8859-1 or windows-1252.
Alan Moore
+1  A: 

From my experience with this exact problem, I found that these characters popped up alot because 1) The user was using a non-English character set (and keyboard) when the content was entered (i.e. Spanish), and 2) The content was not converted to UTF-8. You're on the right track, checking the content type in the header, but you really have to run the content through a converter, as well, if this keeps happening. This problem caused me hours of pain, many years ago, with Classic ASP (I wish I still had access to the code to be of further help).

JasonMichael
A: 

Thanks, guys.

James, it's entered as ®, but the problem is that is gets translated when pulled back to the screen in Edit mode. I've tried HTMlEncode, doing a string replacement, etc... HTMlEncode results in UltraSling ® III

JasonMichael, thanks. I'm pretty sure it's not getting conerted to utf-8. I've tried running it through a converter, and it tells me "exported SGML document text"

This is the document header info (i've been messing with this endlessly):

<!DOCTYPE xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;

> <html
> xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;head&gt; <meta http-equiv="Content-Type"
> content="text/html; charset=utf-8" />

Thanks again. -m

Michael, SO is different from typical forums and NGs. If you want to respond to a specfic answer add a comment to that answer. If you see that you need to supply additional details edit your question. SO is not sequential, votes generally dictate the order in which answers are read.
AnthonyWJones
A: 

® is what ® looks like if it's stored as UTF-8, but displayed as ASCII/ISO-8859-1/Windows-1252. Using the meta tag is not enough to make sure your page is being served as UTF-8. You will also need to set the encoding in the Content-Type HTTP header. This header is typically set either with some server-wide setting or programatically.

I don't know ASP, but this seems to be how you should set that header:

http://stackoverflow.com/questions/250609/htmlencode-utf-8

And this might provide some more information:

http://technet.microsoft.com/en-us/library/bb742422.aspx#EBAA

If your data is stored in a database, you'll also need to make sure the data is either stored in UTF-8 as well, or converted when storing and retrieving it.

mercator
+1  A: 

The fundemental problem is the impact of Response.Codepage on Form Posts.

When you send a form to a client specifying that the content is encoded as UTF-8, the browser will assume that the content of form posts should be sent encoded as UTF-8.

Now the action page that receives the post will (somewhat counter-intuatively) use the value of Response.Codepage to inform it how the characters in the post are encoded. This isn't obvious because we tend to think its the job of the sender to define the encoding of what its sending. Also it isn't a natural leap to think that a property to do with the encoding of what we want to send in our response would have anything to do with how the initial a request is received. In this case it does.

Whats happening is your form is posting a UTF-8 encoded version of the character but the page that receives does not have its Response.Codepage set to 65001 (the UTF-8 codepage). Its probably set to the systems OEM codepage like 1252. Hence the UTF-8 encoding for the character gets interpreted as two individual characters.

My recommendations for good character handling in ASP are:-

  • Save all pages as UTF-8
  • Include <%@ codepage=65001 at the top of all pages
  • Include <% Response.CharSet = "UTF-8" %> at the top all pages
  • Store posted data in a unicode field type such as SQL Servers NVARCHAR type.

The important thing here is that before you read form values in an ASP page you need to make sure that the Response.Codepage is set to a codepage that matches the senders encoding and this doesn't happen automatically.

AnthonyWJones
A: 

I'm having the same problem with PHP code, I think I will try a different PC/editor