ansaurus

Question

UTF Encoding in java

Answer 1

A:

Have you tried using specifying OutputStream encoder using the OutputStreamWriter(OutputStream, Charset)

notnoop 2009-07-01 04:36:48

Answer 2

+2 A:

URL encoding is not the right thing to do to preserve UTF-8 characters. See

http://stackoverflow.com/questions/140549/what-character-set-should-i-assume-the-encoded-characters-in-a-url-to-be-in

ammoQ 2009-07-01 05:07:16

Answer 3

+1 A:

Try doing something like:

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
                                        new FileOutputStream(file),"UTF-8"));

A_M 2009-07-01 07:10:32

Answer 4

A:

There are a lot of causes for the problem you have observed. The primary cause is that REQUEST is not giving you UTF-8 in the first place. I imagine that this situation will change over time, but currently there are many weak links that could be to blame: neither mySQL nor PHP5, html nor browsers use UTF-8 by default, though the data may originally be.

See stackoverflow: how-do-i-set-character-encoding-to-utf-8-for-default-html

and java.sun.com: technicalArticles--HTTPCharset

I experienced this problem with Chinese, and for that I'd recommend herongyang.com

2009-07-01 07:33:11

Answer 5

A:

I seems to me like every single web developer in the world stumbles over this. I'd like to point to an article that helped me alot:

http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

And if you use db2: this IBM developer works Articel

By the way, I think the browsers don't support Unicode in addresses, because one could easily set up a phishing page when you use characters from one language that look similar to characters in another language.

Tim Büthe 2009-07-01 07:46:12

Answer 6

A:

if you are using tomcat then please see my post on the subject here http://nirlevy.blogspot.com/2009/02/utf8-and-hebrew-in-tomcat.html

I had the problem with hebrew but it's the same for every non english language

Nir Levy 2009-07-01 07:54:21

Answer 7

A:

Use an explicit encoding when creating the string you want to send:

final String input = ...;
final String utf8 = new String( input.getBytes( "UTF-8" ) , "UTF-8" );

dhiller 2009-07-01 11:47:19

You can't choose (or change) the encoding of a string in Java. Character encodings only come into play when you convert between strings and other media, like writing to a file--and for that you should use an OutputStreamWriter like others have suggested. `new String(input.getBytes("UTF-8"), "UTF-8")` is just an expensive no-op.

Alan Moore 2009-07-01 14:44:52

@Alan M: Hmm, I will check that out. We've had some encoding problems when the default character set on the platform was ISO-8859-1/15 so we inserted this statement to fix it. If you're saying this is a no-op it won't do any harm if we remove it, right?

dhiller 2009-07-02 05:27:26

Right. Note that if you were doing that with a different encoding, like ISO-8859-1, you could actually corrupt the data. Any characters that weren't covered by that encoding would be replaced with junk in the encoding phase, and decoding it again would not recover them. But UTF-8 can handle any known character, so all you're doing is wasting clock cycles.

Alan Moore 2009-07-03 01:38:07

ansaurus

tags:

views:

answers:

UTF Encoding in java

related questions