views:

586

answers:

7

I need to encode a message from request and write it into a file. Currently I am using the URLEncoder.encode() method for encoding. But it is not giving the expected result for special characters in French and Dutch.

I have tried using URLEncoder.encode("msg", "UTF-8") also.

Example:
Original message: Pour gérer votre GSM
After encoding: Pour g?rer votre GSM

Can any one tell me which method I can use for this purpose?

A: 

Have you tried using specifying OutputStream encoder using the OutputStreamWriter(OutputStream, Charset)

notnoop
+2  A: 

URL encoding is not the right thing to do to preserve UTF-8 characters. See

http://stackoverflow.com/questions/140549/what-character-set-should-i-assume-the-encoded-characters-in-a-url-to-be-in

ammoQ
+1  A: 

Try doing something like:

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
                                        new FileOutputStream(file),"UTF-8"));
A_M
A: 

There are a lot of causes for the problem you have observed. The primary cause is that REQUEST is not giving you UTF-8 in the first place. I imagine that this situation will change over time, but currently there are many weak links that could be to blame: neither mySQL nor PHP5, html nor browsers use UTF-8 by default, though the data may originally be.

See stackoverflow: how-do-i-set-character-encoding-to-utf-8-for-default-html

and java.sun.com: technicalArticles--HTTPCharset

I experienced this problem with Chinese, and for that I'd recommend herongyang.com

A: 

I seems to me like every single web developer in the world stumbles over this. I'd like to point to an article that helped me alot:

http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

And if you use db2: this IBM developer works Articel

By the way, I think the browsers don't support Unicode in addresses, because one could easily set up a phishing page when you use characters from one language that look similar to characters in another language.

Tim Büthe
A: 

if you are using tomcat then please see my post on the subject here http://nirlevy.blogspot.com/2009/02/utf8-and-hebrew-in-tomcat.html

I had the problem with hebrew but it's the same for every non english language

Nir Levy
A: 

Use an explicit encoding when creating the string you want to send:

final String input = ...;
final String utf8 = new String( input.getBytes( "UTF-8" ) , "UTF-8" );
dhiller
You can't choose (or change) the encoding of a string in Java. Character encodings only come into play when you convert between strings and other media, like writing to a file--and for that you should use an OutputStreamWriter like others have suggested. `new String(input.getBytes("UTF-8"), "UTF-8")` is just an expensive no-op.
Alan Moore
@Alan M: Hmm, I will check that out. We've had some encoding problems when the default character set on the platform was ISO-8859-1/15 so we inserted this statement to fix it. If you're saying this is a no-op it won't do any harm if we remove it, right?
dhiller
Right. Note that if you were doing that with a different encoding, like ISO-8859-1, you could actually corrupt the data. Any characters that weren't covered by that encoding would be replaced with junk in the encoding phase, and decoding it again would not recover them. But UTF-8 can handle any known character, so all you're doing is wasting clock cycles.
Alan Moore