views:

279

answers:

3

I need to post a form on a new website which is UTF-8 encoded. The problem is - i need to post it to a legacy site encoded with western european (iso). Certain characters gets messed up in the post (like danish special characters).

It is not possible to change the character encoding on the legacy website as it would definately break stuff on the old site (so that's a no-go). I might be able to do some magic with the data (some branching on input) on the legacy site, but that would be the fallback solution.

I have jquery on the client to help with whatever possible encoding tricks would be possible there.

I see a possible solution of actually posting from the new UTF-8 page to another new page that changes the encoding server-side and reposting it to the legacy site, but that just seems ugly...

The new site is running asp.net mvc and the old legacy is classic asp (not asp.net) if that makes a difference (i hope it doesn't since i'd really like to handle stuff client-side).

+1  A: 

IIUC, it doesn't really matter what the web pages on the old site are encoded in, as the form will be on the new site. What matters is what encoding the server of the old site expects. And if the server expects the data to be submitted in latin-1, you only have two choices:

  1. change the server to acccept the data in UTF-8 (perhaps under a different URL)
  2. make sure the client submits the data in Latin-1

As you have ruled out option 1, your only choice is option 2 (but do reconsider doing option 1). For option 2, you again have choices, one being to use a proxy as you propose. However, it would probably be better if the page containing the form was encoded in Latin-1 (despite the rest of the site being UTF-8). This should work well if you don't want to display non-latin-1 information on the page (such as Chinese text). You just have to explain to asp.net that this specific page should be rendered in latin-1 (and the web server should send an appropriate Content-type).

Martin v. Löwis
The post i make is actually a search-query onto another site. The search form is present on many different pages, and is present on the most important page of the new site, so changing the encoding for the post's sake would mean setting encoding for almost all of the new site, which has to be UTF-8 (a requirement). I'll ponder some more on what you wrote :)
Per Hornshøj-Schierbeck
A: 

You can control the encoding used in the form, regardless of the encoding of the hosting page. For example,

  <form accept-charset="latin-1" ... >
ZZ Coder
This seems to work only in firefox? (not IE)?
Per Hornshøj-Schierbeck
It works on IE for us. Please use official name "ISO-8859-1". IE used to have a bug that it treats Latin1 as CP-1252. That may cause problems to certain characters.
ZZ Coder
Yeah it seems to cause problems for danish special characters
Per Hornshøj-Schierbeck
A: 

Ok so we ended up url encoding the data before submitting it to the landing page.

This solved the two problems the other solutions had of 1) an extra landing/jump-page to transform the values and 2) of not rewriting a large part of the old legacy (which is being migrated) site to accept UTF-8

Anyways we put the logic of submitting data from out site to the legacy site into a jquery plugin, which handles any form elements in the form by escaping each value before posting the form - actually the landing/old page requires the data in GET format, so we just end up setting window.location to the resulting querystring - but i guess we might as well have submitted the form with a normal POST if we wanted to.

Per Hornshøj-Schierbeck