The problem is likely exactly there where you're reading, writing and/or displaying those characters.
If you're reading those characters using a Reader
, then you need to construct an InputStreamReader
first using the 2-argument constructor wherein you can pass the correct encoding (thus, UTF-8
) as 2nd argument. E.g.
reader = new InputStreamReader(url.openStream(), "UTF-8");
If you're for example writing those characters to a file, then you need to construct an OutputStreamWriter
using the 2-argument constructor wherein you can pass the correct encoding (thus, UTF-8
) as 2nd argument. E.g.
writer = new OutputStreamWriter(new FileOutputStream("/page.html"), "UTF-8");
If you're for example writing it all plain vanilla to the stdout (e.g. System.out.println(line)
and so on, then you need to ensure that the stdout itself is using the correct encoding (thus, UTF-8
). In an IDE such as Eclipse you can configure it by Window > Preferences > General > Workspace > Encoding.