views:

176

answers:

3

Why is it that when I use Firefox to enter: , the GET will transform to:

q=%E6%BC%A2&start=0

However, when I use IE8 and I type the same chinese character, the GET is:

q=?&start=0

It turns it into a question mark.

A: 

It can either be font installation or URL encoding issue

One of main issue which I have seen when dealing with CJK characters is the installation of East Asian Language fonts not done by default when OS is installed. These characters show up properly in MS Word even without installation being done. To make sure all applications in OS can deal with CJK (Chinese, Japanese and Korean), doing the below exercise is better

  1. Go To Control Panel
  2. Select Regional And Language Options
  3. Go to language tab
  4. Select checkbox to install fonts for East Asian Languages

Hopefully you have the windows CD with you to proceed with this.

After that IE8 hopefully would show characters properly.

Also in case you are doing any url encoding make sure you always use UTF-8 as the character encoding when dealing with non ASCII characters.

Fazal
The question mark is usually the default character when calling WideCharToMultiByte, or the dotnet equivalent. So most likely, the page encoding is just not marked correctly. If he's seeing the "Kan" character at all, a font suitably marked for IE or Firefox's font fallback mechanism is installed already.
JasonTrue
A: 

To begin with, IE believes that Chinese characters can be sent 'as is' in UTF-8, while Firefox thinks they need to be URL-encoded.

Have you watched the GET request on the wire? I bet that it's really a three-byte sequence and that the tool you are using to display it is reducing it to a ?.

bmargulies
+4  A: 

Mark the encoding of the page as UTF-8 and this problem will go away. Firefox will fail to autodetect your encoding without this hint sometimes, too. And you may have manually changed the encoding in IE once, so that becomes the new default for unmarked pages.

put this in your <HEAD>:

<META http-equiv="Content-Type" content="text/html; charset=utf-8">

If your content isn't really in UTF-8, then you'll need to use an alternate method. There's an html attribute on FORM that hints to IE that you want non-ANSI codepage characters to be sent as UTF-8, but it's far nicer to just use the correct content type.

Also, the address bar may not be the best place to look at the resulting text, as the last time I checked, it didn't reliably work with non-ACP characters. Make sure you're looking at the actual request data.

If you're talking about entering text into the address bar or search box in the browser, and not a specific web page, I don't reproduce this problem on English Windows 7. Perhaps you're using a very old version of Windows and your system ANSI code page does not contain that character; Win95/Win98/WinME would certainly have that problem.

Edited to add: In IE 8, entering the character you specified on a page containing this content works exactly as expected for me. I've verified this with Fiddler. Whatever problem you are having is probably different than what you have described so far.

<HTML>
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
<form accept-charset="utf-8" method="get" action="http://www.example.com/something"&gt;
<input type="text" name="q">
<input type="submit">
</form>
</BODY>
</HTML>

You actually don't need the accept-charset unless you are using an alternate encoding for the page itself. But I am leaving it in for illustrative purposes. For it to be actually useful, at least in earlier versions of IE (things may have changed; a colleague of mine specified the behavior back in IE5 or so), you need a hidden "_charset_" field with no value to encourage the browser to mark what charset it actually used, but that's superfluous in a utf-8 page).

JasonTrue
Hi Jason, it's not working. this META tag is in there. Also, the <form accept-charset="utf-8"> is in there too>
TIMEX
I'm sorry, I just don't reproduce what you're seeing. In Fiddler on a minimalist page, I see: # Result Protocol Host URL Body Caching Content-Type Process Comments Custom 11 404 HTTP www.example.com /something?q=%E6%BC%A2 1,635 text/html iexplore:6836
JasonTrue