tags:

views:

330

answers:

3

I've written a Java program to generate an m3u file based on a CD ripped from k3b which pretty much preserves special character encodings in artist, album and track names. I then place these m3u files on a server and generate a GWT web application where the m3u file name is the target of an HTML anchor tag. For 99+% of cases, this all works perfectly. For a few cases, special characters cause the link to fail.

One failing example is the Movits! album Äppelknyckarjazz (note the first character which gets encoded by a URI constructor as %C3%84). Since the client is GWT, view source does not show the link, :-( But when hovering over the link Firefox shows the correctly decoded URL. When clicking on the link, Firefox fails with: "...Äppelknyckarjazz.m3u was not found on this server" It is as though different character encoding schemes are at play but frankly my brain is hurting in trying to unravel the puzzle at this level.

So there are really two questions:

1) Is my problem an encoding scheme issue?

2) Assuming it is, how can I maintain consistency given the various pieces of the application (Java m3u generater, GWT client, Firefox browser, Apache web server).

+2  A: 
String result = java.net.URLEncoder.encode("Äppelknyckarjazz");

I think this is a solution for you.

Martijn Courteaux
I believe I am achieving this result using the URI constructor. The m3u file name and file content is encoded correctly.
pajato0
A: 

Ä can be encoded as %C3%84 (UTF8) or %C4 (Latin1). Sounds like you are using a mixture of Latin11 and UTF8. You need to make sure the same encoding is used across all your systems.

In rare case that you can't control the encoding, see my answer to this question,

http://stackoverflow.com/questions/887148/how-to-determine-if-a-string-contains-invalid-encoded-characters

ZZ Coder
Of the two answers, this on get to the meat of the matter. And the referenced material captures the essence of the issue. Much thanks.-pmr
pajato0
A: 

First you have to declare a charset on your HTML-page. Best ist UTF-8.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Then you should configure your webserver to interpret requests from clients as UTF-8. When using tomcat, set the URIEncoding-parameter on your Connector-tag:

<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8" />
Witek