I am helping a client convert their Perl flat-file bulletin board site from ISO-8859-1 to Unicode.
Since this is my first time, I would like to know if the following "checklist" is complete. Everything works well in testing, but I may be missing something which would only occur at rare occasions.
This is what I have done so far (forgive me for only including "summary" code examples):
Made sure files are always read and written in UTF-8:
use open ':utf8';
Made sure CGI input is received as UTF-8 (the site is not using CGI.pm):
s{%([a-fA-F0-9]{2})}{ pack ("C", hex ($1)) }eg; # Kept from existing code s{%u([0-9A-F]{4})}{ pack ('U*', hex ($1)) }eg; # Added utf8::decode $_;
Made sure text is printed as UTF-8:
binmode STDOUT, ':utf8';
Made sure browsers interpret my content as UTF-8:
Content-Type: text/html; charset=UTF-8 <meta http-equiv="content-type" content="text/html;charset=UTF-8">
Made sure forms send UTF-8 (probably not necessary as long as page encoding is set):
accept-charset="UTF-8"
Don't think I need the following, since inline text (menus, headings, etc.) is only in ASCII:
use utf8;
Does this looks reasonable or am I missing something?
EDIT: I should probably also mention that we will be running a one-time batch to read all existing text data files and save them in UTF-8 encoding.