views:

34

answers:

4

I am creating a coldfusion page, that takes language translation data stored in a table in my database, and makes static js files for each language pairing of english to _ etc...

I am now starting to work on russian, I was able to get the other languages to work fine..

However, when it saves the file, all the text looks like question marks. Even when I run my translation app, the text for just that language looks like all ?????

I have tried writing it via cffile as utf-8 or ISO-8859-1 but neither seems to get it to display properly.

Any suggestions?

A: 

Have you tried ISO-8859-5? I believe it's the encoding that "should" be used for Russian.

Robin
When i do that, it appears to be a non-russian charset.
crosenblum
It seems like your not the only one with this problem:http://www.sitepoint.com/forums/showthread.php?t=566418
Robin
A: 

I can't personally reproduce this problem at all. Is the ColdFusion template that is making the call itself UTF-8? (with or without a BOM it matters not for Russian). In any case UTF-8 is absolutely what you should be using. Make sure you get a UTF-8 compliant editor. Which is most things on Mac. On Windows you could use Scite or GVim.

ryber
+1  A: 

By all means do use UTF-8 over any other encoding type. You need to make sure that:

  • your cfm templates were written to disk with UTF-8 encoding (notepad++ handles that nicely, and so does Eclipse or the new ColdFusion Builder)
  • your database was created with the proper codepage for nvarchar (and varchar) datatypes
  • your database connection handles UTF-8

How to go about the last two items depends on your database back-end. Coldfusion is quite agnostic in that regard, as it will happily use any jdbc driver that you may need.

When working in a multi-character set environment, character set conversion issues can occur and it can be difficult to determine where the conversion issue occurred.

There are two categories into which conversion issues can be placed. The first involves sending data in the wrong format to the client API. Although this cannot happen with Unicode APIs, it is possible with all other client APIs and results in garbage data.

The second category of issue involves a character that does not have an equivalent in the final character set, or in one of the intermediate character sets. In this case, a substitution character is used. This is called lossy conversion and can happen with any client API. You can avoid lossy conversions by configuring the database to use UTF-8 for the database character set.

The advantage of UTF-8 over any other encoding is that you can handle any number of languages in the same database / client.

Vincent Buck
A: 

The correct encoding to use in a .js file is whatever encoding the parent page is in. Whilst there are methods to serve JavaScript using a different encoding to the page including it, they don't work on all browsers.

So make sure your web page is being saved and served in an encoding that contains the Russian characters, and then save the .js file using the same encoding. That will be either:

  • ISO-8859-5. A single-byte encoding with Cyrillic in the high bytes, similar to Windows code page 1251. cp1251 will be the default encoding when you save in a text editor from a Russian install of Windows;

  • or UTF-8. A multi-byte encoding that contains every character. All modern websites should be using UTF-8.

(ISO-8859-1 is Western European and does not include any Cyrillic. It is similar to code page 1252, the default on a Western Windows install. It's of no use to you.)

So, best is to save both the cf template and the js file as UTF-8, and add <cfprocessingdirective pageencoding="utf-8"> if CF doesn't pick it up automatically.

If you can't control the encoding of the page that includes the script (for example because it's a third party), then you can't use any non-ASCII characters directly. You would have to use JavaScript string literal escapes instead:

var translation_ru= {
    launchMyCalendar: '\u0417\u0430\u043f\u0443\u0441\u043a \u041c\u043e\u0439 \u043a\u0430\u043b\u0435\u043d\u0434\u0430\u0440\u044c'
};

when it saves to file it is "·ÐßãáÚ ¼ÞÙ ÚÐÛÕÝÔÐàì" so the charset is wrong

Looks like you've saved as cp1251 (ie. default codepage on a Russian machine) and then copied the file to a Western server where the default codepage is cp1252.

I also just found out that my text editor of choice, textpad, doesn't support unicode.

Yes, that was my reason for no longer using it too. EmEditor (commercial) and Notepad++ (open-source) are good replacements.

bobince