I've a web application (well, in fact is just a servlet) which receives data from 3 different sources:
- Source A is a HTML document written in UTF-8, and sends the data via
<form method="get">
. - Source B is written in ISO-8859-1, and sends the data via
<form method="get">
, too. - Source C is written in ISO-8859-1, and sends the data via
<a href="http://my-servlet-url?param=value&param2=value2&etc">
.
The servlet receives the request params and URL-decodes them using UTF-8. As you can expect, A works without problems, while B and C fail (you can't URL-decode in UTF-8 something that's encoded in ISO-8859-1...).
I can make slight modifications to B and C, but I am not allowed to change them from ISO-8859-1 to UTF-8, which would solve all the problems.
In B, I've been able to solve the problem by adding accept-charset="UTF-8"
to the <form>
. So it sends the data in UTF-8 even with the page being ISO.
What can I do to fix C?
Alternatively, is there any way to determine the charset on the servlet, so I can call URL-decode with the right encoding in each case?
Edit: I've just found this, which seems to solve my problem. I still have to make some tests in order to determine if it impacts the perfomance, but I think I'll stick with that solution.