views:

34

answers:

2

Is there a way to specify certain part of a html file as another encoding?

The default encoding for the (generated) html is utf-8. However, some of the included data to be inserted in the html is in another encoding. It's something like:

 <div>
     the normal html in utf-8
 </div>

 <div>
     <%= raw_data_in_another_encoding %>
 </div>

Is there a way to hint a browser to render the 2nd <div> in another encoding? thanks

A: 

No, the entire file must have a single encoding. If you're saving a plain .html file, you'll have to convert the entire file to one encoding.

If you're using a server-side scripting language, however, you can always convert text from one encoding to another. You might designate UTF-8 as the encoding for the page, and then when you encounter bits of content currently encoded in, say, latin1, you can simply convert it to UTF-8 before outputting it.

How you do that, of course, would depend on the particular server-side language you're using.

In PHP, you could do:

echo iconv('ISO-8859-1', 'UTF-8', $someLatin1Text);
VoteyDisciple
iconv will show an error: `Iconv::IllegalSequence in SomeView#show` if the $someLatin1Text contains something not in the acceptable character range.
ohho
Yes. You'd have to know what encoding you're using, and specify that as the first parameter. In my example, I'm assuming that `$someLatin1Text` actually does contain latin1 text.
VoteyDisciple
A: 

You can send any arbitrary encoding at any point in your HTTP response stream, but generally your client won't be able to deal with it. In HTML, multiple encodings in the same document simply aren't permitted. Or even gracefully handled by any modern client except perhaps by accident.

If you are using Ruby (guessing based only on your naming conventions), you can convert a string from one encoding to another using the iconv library. If you're using something else, there's most likely a similar alternative. PHP and Python both offer some encoding translation options based on the iconv library. In the .Net Framework, you can use the Encoding class to grab the suitable source encoding, and call GetBytes with your source byte array as the parameter to get a string suitable for further manipulation.

Numerical character references are another option, if you are primarily using another encoding and only occasionally using characters outside of that encoding's supported range. However, you're generally going to stay saner by converting to and from UTF-8 from legacy encodings.

JasonTrue
iconv will generate an error if the encoded characters are not in the character range defined for that encoding. Since it's user data and I cannot guarantee it's conformation to the encoding range. That's why I choose to pass the data to the browser directly...
ohho
Is the user data sent to you from another web page? Then you should store it in UTF-8. If it's sent to you in a page properly marked as UTF-8, you'll get it already encoded. If it's from another data source, you should be tagging it at the time of storage. If that's not possible, and you don't know what encoding it is, your only real option is to send out that data in something like an IFRAME without sending any encoding metadata, so the user can force the IFRAME's page encoding to something else, but that's very drastic and not very discoverable to most users.
JasonTrue
the data is from user database.
ohho
I presume you mean that the database is sent to you by a user and doesn't store any metadata about the encoding. But if your database is one of the common ones (MS Sql, MySql, etc), the database itself has a collation attribute at either the database level (the default) or at the field level. The collation generally either explicitly or implicitly specifies the encoding supported by the database or field you're examining. (UTF-8 is explicit, something like Japanese_CI_AS implies Shift-Jis on MSSql for varchar). You can use this information to choose the encoding to pass to iconv.
JasonTrue