views:

68

answers:

1

I am building a form that needs to accept characters encoded in SHIFT_JIS and then send those results via email to a recipient. I've tried to simply capture the results from the $_POST variable and then to insert them into a block of text like this:

$NameJp = $_POST['NameJp'];
$contents = <<<TEST

Name: $NameJp

...
TEST

but that doesn't appear to work (which doesn't really surprise me). This is my first attempt at dealing with non-ASCII characters in PHP and I am hoping that people might have some suggestions. Perhaps I am missing a simple function to encode the text.

Also, are there any other potential pitfalls that I might encounter?

Thanks.

+2  A: 

Also, are there any other potential pitfalls that I might encounter?

Yes. :)

It's all fine and good to receive text in SHIFT_JIS, but you'll have to handle it like SHIFT_JIS all the way after that and inform everybody else that this text is in SHIFT_JIS. I.e., if you insert it into an email, you'll need to set appropriate email headers informing clients that this email contains text in SHIFT_JIS. If you need to display it on a web page or just in debug, you'll need to make sure the browser or debugging environment handles the text as SHIFT_JIS. If you mix it with other text, you should make sure they're in the same encoding.

Honestly, SHIFT_JIS is antiquated and terrible to work with. You should use UTF-8 if at all possible. If you absolutely, positively need to accept SHIFT_JIS and do a lot of work on it, you may want to convert it to UTF-8 internally and convert it back to whatever output encoding is necessary when the time comes. You can use iconv to do so.

Email headers are special BTW in that they can not contain anything but ASCII characters. The subject of an email is part of the header. To send non-ASCII characters as a subject line, you'll need to MIME encode it using mb_encode_mimeheader.

Also, the obligatory reference to: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

deceze
@deceze - Thanks for the comments. It does appear that I can convert the form to use UTF-8 which does seem more standard and better supported.
Joe Corkery
@Joe The thing is that SHIFT_JIS is limited to a certain subset of characters, while UTF-8 can represent (virtually) any character. Working in SHIFT_JIS (or ASCII, or any limited character set) requires great care not to attempt to encode "invalid" characters. For this reason most tools prefer Unicode by default, which means you'll have a smoother ride developing for it, but must still take care to use UTF-8 across the board. Other than that SHIFT_JIS should be supported just fine as well.
deceze
@deceze - I just managed to resolve the problem. After investigating more of the details of submitting unicode characters to forms, I discovered the 'accept-charset' field for forms. Once I specified that everything started working as expected.
Joe Corkery