views:

63

answers:

3

This is not a pure java question and can also be related to HTML

I've written a java servlet that queries a database table and shows the result as a html table. The user can also ask to receive the result as an Excel sheet. Im creating the Excel sheet by printing the same html table, but with the content-type of "application/vnd.ms-excel". The Excel file is created fine. The problem is that the tables may contain non-english data so I want to use a UTF-8 encoding.

PrintWriter out = response.getWriter();
response.setContentType("application/vnd.ms-excel:ISO-8859-1");
//response.setContentType("application/vnd.ms-excel:UTF-8");
response.setHeader("cache-control", "no-cache");
response.setHeader("Content-Disposition", "attachment; filename=file.xls");
out.print(src);
out.flush();

The non-english characters appear as garbage (áéíóú)

Also I tried converting to bytes from String

byte[] arrByte = src.getBytes("ISO-8859-1");
String result = new String(arrByte, "UTF-8");

But I Still getting garbage, What can I do?. Thanks

UPDATE: if I open the excel file in notepad + + the type of file encoding is "UTF-8 without BOM", if I change the encoding to "UTF-8" and then open the file in Excel, the characters "áéíóú" look good.

A: 

Do you get "garbage" when you print result to standard output?

Edit (code in code tags from the comment below): response.setContentType("application/vnd.ms-excel; charset=UTF-8")

invaderkay
Only in excel (2003)
Xerg
Your java code seems fine. I think the issue might be the way you're declaring the encoding and the way Excel '03 is expecting to see it. Try doing response.setContentType("application/vnd.ms-excel; charset=UTF-8") instead.
invaderkay
Indeed the file is saved as "UTF-8 without bom" but if you change to "UTF-8" with notepad ++, it looks good in excel.
Xerg
@Xerg it seems like Windows makes heavy use of BOM...it might throw off Excel if it receives a file without it. Maybe if you add the BOM characters to the response it will work: http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
Michael Angstadt
A: 

Excel is a binary format, not a text format, so you should not need to set any encoding, since it simply doesn't apply. Whatever system you are using to build the excel file (e.g. Apache Poi) will take care of the encoding of text within the excel file.

You should not try to convert the recieved bytes to a string, just store them in a byte array or write them out to a file.

EDIT: from the comment, it doesn't sound as if you are using a "real" binary excel file, but a tab delimited text file (CSV). In that case, make sure you use consistent encoding, e.g UTF-8 throughout.

Also, before calling response.getWriter(), call setContentType first.

See HttpServletResponse.getPrintWriter()

EDIT: You can try writing the BOM. It's normally not required, but file format handling in Office is far from normal...

Java doesn't really have support for the BOM. You'll have to fake it. It means that you need to use the response outputStream rather than writer, since you need to write raw bytes (the BOM). So you change your code to this:

response.setContentType("application/vnd.ms-excel:UTF-8");
// set other headers also, "cache-control" etc..
OutputStream outputStream = response.getOutputStream();
outputStream.write(0xEF);   // 1st byte of BOM
outputStream.write(0xBB);
outputStream.write(0xBF);   // last byte of BOM
// now get a PrintWriter to stream the chars.
PrintWriter out = new PrintWriter(new OutputStreamWriter(outputStream,"UTF-8"));
out.print(src);
mdma
I'm not using any library... just text with "\t", "\n" and content type application/vnd.ms-excel
Xerg
Ok, but have you tried opening your .xls in excel? It doesn't sound like a real excel file to me, perhaps a tab-delimited CSV file? (Excel will view this file.)
mdma
Ive got the problem only when I open file in excel 2003. I see garbage in characters like "áéíóú"..
Xerg
I've updated my response, since you mentioned the BOM. Creating the BOM in java is not entirely straightforward, so I've posted code for doing that.
mdma
A: 

Try using the ServletResponse.setCharacterEncoding() method.

response.setCharacterEncoding("UTF-8");
Michael Angstadt
It doesn't work :(
Xerg