GNU diff doesn't seem to be smart enough to detect and handle UTF-16 files, which surprises me. Am I missing an obvious command-line option? Is there a good alternative?
From the GNU diff documentation:
Handling Multibyte and Varying-Width Characters
diff, diff3 and sdiff treat each line of input as a string of unibyte characters. This can mishandle multibyte characters in some cases. For example, when asked to ignore spaces, diff does not properly ignore a multibyte space character.
Also, diff currently assumes that each byte is one column wide, and this assumption is incorrect in some locales, e.g., locales that use UTF-8 encoding. This causes problems with the -y or --side-by-side option of diff.
These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.
The IBM GNU/Linux Technology Center Internationalization Team has proposed some patches to support internationalized diff http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz. Unfortunately, these patches are incomplete and are to an older version of diff, so more work needs to be done in this area.
I never realized that myself.
It looks like Guiffy could to the job if a nonfree, non-command line tool will do the job, still looking for a freeware command line tool:
You could maybe build something in python with the excellent chardet, then convert your files to UTF-8 and send this to GNU diff ?
vimdiff
works quite nicely for this purpose.
I found it while reading this StackOverflow answer.