views:

91

answers:

2

I want to read a webpage A in ISO-8859-1 charset, according to the browser, and return the content in UTF-8 as a content of the webpage B.

This is: I want to show the content of the page A in the same charset that I use to show the rest of the page B, that is UTF-8.

How do I do this in java/groovy?

thanks in advance

+1  A: 

You don't say what stack you're building on or how you're accessing the content, but the general mechanism for such a transcoding operation is to use UTF-16 as an intermediary; that is, convert ISO-8859-1 bytes to UTF-16 chars to UTF-8 bytes.

You could use InputStreamReader (with the an ISO-8859-1 Charset), then write bytes via OutputStreamWriter (with a UTF-8 Charset).

Some APIs provide encoding operations as part of their I/O classes (e.g. ServletResponse.getWriter()).

I'm ignoring any need to parse and transform the data, which is a whole other can of worms.

McDowell
+1  A: 

In Groovy you could write something like this:

def source = new URL("http://www.google.com").getText("ISO-8859-1")
def target = new String(source.getBytes("UTF-8"), "UTF-8")
Christoph Metzendorf
thanks it works perfectly
damian
@Christoph Metzendorf - I don't get the 2nd line. According to the Groovy API, `source` will be a (UTF-16) `java.lang.String`. You convert it from a string to a UTF-8 encoded byte array and back to a (UTF-16 encoded) string again.
McDowell