ansaurus

Question

Is it possible to set two encodings for one hml?

Answer 1

A:

No, the entire file must have a single encoding. If you're saving a plain .html file, you'll have to convert the entire file to one encoding.

If you're using a server-side scripting language, however, you can always convert text from one encoding to another. You might designate UTF-8 as the encoding for the page, and then when you encounter bits of content currently encoded in, say, latin1, you can simply convert it to UTF-8 before outputting it.

How you do that, of course, would depend on the particular server-side language you're using.

In PHP, you could do:

echo iconv('ISO-8859-1', 'UTF-8', $someLatin1Text);

VoteyDisciple 2010-06-08 03:16:20

iconv will show an error: `Iconv::IllegalSequence in SomeView#show` if the $someLatin1Text contains something not in the acceptable character range.

ohho 2010-06-08 04:53:24

Yes. You'd have to know what encoding you're using, and specify that as the first parameter. In my example, I'm assuming that `$someLatin1Text` actually does contain latin1 text.

VoteyDisciple 2010-06-08 12:41:54

Answer 2

A:

You can send any arbitrary encoding at any point in your HTTP response stream, but generally your client won't be able to deal with it. In HTML, multiple encodings in the same document simply aren't permitted. Or even gracefully handled by any modern client except perhaps by accident.

If you are using Ruby (guessing based only on your naming conventions), you can convert a string from one encoding to another using the iconv library. If you're using something else, there's most likely a similar alternative. PHP and Python both offer some encoding translation options based on the iconv library. In the .Net Framework, you can use the Encoding class to grab the suitable source encoding, and call GetBytes with your source byte array as the parameter to get a string suitable for further manipulation.

Numerical character references are another option, if you are primarily using another encoding and only occasionally using characters outside of that encoding's supported range. However, you're generally going to stay saner by converting to and from UTF-8 from legacy encodings.

JasonTrue 2010-06-08 03:27:50

iconv will generate an error if the encoded characters are not in the character range defined for that encoding. Since it's user data and I cannot guarantee it's conformation to the encoding range. That's why I choose to pass the data to the browser directly...

ohho 2010-06-08 04:46:38

Is the user data sent to you from another web page? Then you should store it in UTF-8. If it's sent to you in a page properly marked as UTF-8, you'll get it already encoded. If it's from another data source, you should be tagging it at the time of storage. If that's not possible, and you don't know what encoding it is, your only real option is to send out that data in something like an IFRAME without sending any encoding metadata, so the user can force the IFRAME's page encoding to something else, but that's very drastic and not very discoverable to most users.

JasonTrue 2010-06-08 05:26:17

the data is from user database.

ohho 2010-06-08 06:11:07

I presume you mean that the database is sent to you by a user and doesn't store any metadata about the encoding. But if your database is one of the common ones (MS Sql, MySql, etc), the database itself has a collation attribute at either the database level (the default) or at the field level. The collation generally either explicitly or implicitly specifies the encoding supported by the database or field you're examining. (UTF-8 is explicit, something like Japanese_CI_AS implies Shift-Jis on MSSql for varchar). You can use this information to choose the encoding to pass to iconv.

JasonTrue 2010-06-08 15:06:48

ansaurus

tags:

views:

answers:

Is it possible to set two encodings for one hml?

related questions