The main difficulty is making sure you've checked that all the data paths are UTF-8 clean:
Is your site DB-backed? If so, you'll need to convert all the tables to UTF-8 or some other Unicode encoding, so sorting and text searching work correctly.
Is your site using some programming language for dynamic content? (PHP, mod_perl, ASP...?) If so, you'll have to make sure the particular language interpreter you're using fully understands some form of Unicode, work out the conversions if it isn't using UTF-8 natively -- UCS-2 is next most common -- and check that it's configured to use UTF-8 on its output to the web server.
Does your site have some kind of back-end app server? Does it use UTF-8?
EDIT: There are at least three different places you can declare the charset for a web document. Be sure you change them all:
- the HTTP Content-Type header
- the <meta http-equiv="Content-Type"> tag in your documents' <head>
- the <?xml> tag at the top of the document, if using XHTML Strict
All this comes from my experiences a few years ago when I traced some Unicode data through a moderately complex N-tier app, and found conversion chains like:
Latin-1 -> UTF-8 -> Latin-1 -> UTF-8
Much of this was due to the less mature Unicode support at the time, but you can still find yourself messing with ugliness like this if you're not careful to make the pipeline UTF-8 clean.
As for your comments about searching out Latin-1 characters and converting files one by one, I wouldn't do that. I'd build a script around the iconv
utility found on every modern Linux system, feeding in every text file in your system, explicitly converting it from Latin-1 to UTF-8. Leave no stone unturned.