views:

463

answers:

2

I support a website written in Tcl which displays data in Traditional Chinese (big5). We then have a Java servlet, using the translation code from mandarintools.com, to translate a page request into Simplified Chinese. The conversion as specified to the translation code is from UTF-8 to UTF-8S; Java is apparently correctly translating the data to UTF-8 as it comes in.

The Java translation code works but is slow, and since the website is written in Tcl someone on another list suggested I try using that. Unfortunately, Tcl doesn't support UTF-8S and I have been unable to figure out what translation to use in its place. I've tried gb2312, gb2312-raw,gb1988, euc-cn... all result in gibberish. My assumption is that Tcl is also translating to UTF-8 as it comes in, though I have tried converting from big5 first and it doesn't help.

My test code looks like this:

set page_body [ns_httpget http://www.mysite.com]
set translated_page_body [encoding convertto gb2312 $page_body]
ns_write $translated_page_body

I have also tried

set page_body [ns_httpget http://www.mysite.com]
set translated_page_body [encoding convertto gb2312 [encoding convertfrom big5 $page_body]]
ns_write $translated_page_body

But it didn't change anything.

Does anyone out there have enough experience with this to help me figure it out?

A: 

By any chance, are you grabbing your data from Oracle?

If so, see if you can use the CONVERT function to convert to from "utf8" to "al32utf8", which is the true Utf8 standard and which Tcl should work-with seamlessly.

If not, well, I guess I'll wait for you comment(s).

hythlodayr
Nope, I'm actually requesting a page via http - that's the first line in each of the code snippets above. The data comes out of Postgres originally, is assembled into the Traditional Chinese page, and then I need to take that and make a Simplified version of it.
Janine
Ahh, I completely misunderstood.I don't know if this is related, but at least for Tcl 8.3 and Tcl 8.4 there were problems with the gb2312 locales (the article also mentions a work-around).http://aspn.activestate.com/ASPN/Mail/Message/tcl-bugs/1215168I hope that helps.
hythlodayr
Looked promising but I've tried it and no joy.I'm using Tcl 8.5.7, since that might make a difference.
Janine
I'm afraid I've hit a wall. All I can do at this point is refer you to the Tcl newsgroup. Some of the big Tcl guys (e.g., Kevin Kenny) are quite responsive, so you may some luck there. Sorry.
hythlodayr
A: 

FYI for completeness' sake, I've been told by Tcl experts that you can't do the conversion this way, it has to be done via character replacement.

Janine