views:

216

answers:

5

Hi,

I have a webpage that is encoded (through its header) as WIN-1255. A Java program creates text string that are automatically embedded in the page. The problem is that the original strings are encoded in UTF-8, thus creating a Gibberish text field in the page.

Unfortunately, I can not change the page encoding - it's required by a customer propriety system.

Any ideas?

UPDATE:

The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.

SECOND UPDATE:

Thanks for all the responses. I've managed to convert th string, and yet, Gibberish. Problem was that XML encoding should be set in addition to the header encoding.

Adam

A: 

What's embedding the data in the page? Either it should read it as text (in UTF-8) and then write it out again in the web page's encoding (Win-1255) or you should change the Java program to create the files (or whatever) in Win-1255 to start with.

If you can give more details about how the system works (what's generating the web page? How does it interact with the Java program?) then it will make things a lot clearer.

Jon Skeet
I really wish I could've done that, but problem is I get the String in UTF-8 and must deliver the whole page as a WIN-1255. Will update my answer.
Adam Matan
@Adam: What exactly do you mean by "I get the String in UTF-8"? You still haven't explained how the system works. If you've got a String in Java with the right data in, that doesn't inherently *have* an encoding (or rather, it's always UTF-16). But we don't know whether you've got the whole system in Java, or what...
Jon Skeet
@Jon Sorry for the fuss, I'll try to clarify. 1. I know that the original string representation is supposed to be irrelevant, but I tried to give some background. 2. The problem can be summerized to "How can I create a RSS feed page with WIN-1255 strings." Thanks for the help.
Adam Matan
@Adam: And again, we'll need to know what the server configuration looks like. What is generating the page? Is this a JSP, a servlet, something else?
Jon Skeet
@Jon Problem solved (see another update in my answer). The server is, again, external to my production team and I know very little about it.
Adam Matan
+1  A: 

Assuming you have control of the original (properly represented) strings, and simply need to output them in win-1255:

import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];

Then, simply write the contents of ba at the appropriate place.

EDIT: What you do with ba depends on your environment. For instance, if you're using servlets, you might do:

ServletOutputStream os = ...
os.write(ba);

We also should not overlook the possible approach of calling setContentType("text/html; charset=windows-1255") (setContentType), then using getWriter normally. You did not make completely clear if windows-1255 was being set in a meta tag or in the HTTP response header.

You clarified that you have a UTF-8 file that you need to decode. If you're not already decoding the UTF-8 strings properly, this should no big deal. Just look at InputStreamReader(someInputStream, Charset.forName("utf-8"))

Matthew Flaschen
Thanks! What should I now do with the byte array?ba[i] is an integer, I need some representation conversion here.
Adam Matan
A: 

The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.

In this case, use a parser to load the UTF-8 XML. This should correctly decode the data to UTF-16 character data (Java Strings are always UTF-16). Your output mechanism should encode from UTF-16 to Windows-1255.

McDowell
A: 
byte[] originalUtf8;//Here input

//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));

//Here output
josefx
+1 Thanks! It's insightful, but a bit too complicated for my current needs.
Adam Matan
+2  A: 

To the point, you need to set the encoding of the resposne writer. With only a response header you're basically only instructing the client application which encoding to use to interpret/display the page. This ain't going to work if the response itself is written with a different encoding.

The context where you have this problem is entirely unclear (please elaborate about it as well in future problems like this), so here are several solutions:

If it is JSP, you need to set the following in top of JSP to set the response encoding:

<%@ page pageEncoding="WIN-1255" %>

If it is Servlet, you need to set the following before any first flush to set the response encoding:

response.setCharacterEncoding("WIN-1255");

Both by the way automagically implicitly set the Content-Type response header with a charset parameter to instruct the client to use the same encoding to interpret/display the page. Also see this article for more information.

If it is a homegrown application which relies on the basic java.net and/or java.io API's, then you need to write the characters through an OutputStreamWriter which is constructed using the constructor taking 2 arguments wherein you can specify the encoding:

Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");
BalusC