views:

18

answers:

1

I have an HTML document stored in a file, with a UTF-8 encoding, and I want my extension to display this file in the browser, so I call loadURIWithFlags('file://' + file.path, flags, null, 'UTF-8', null); but it loads it as ISO-8859-1 instead of UTF-8. (I can tell because ISO-8859-1 is selected on the View>Character Encoding menu, and because non-breaking-space characters are showing up as an  followed by a space. If I switch to UTF-8 using the Character Encoding menu, then everything looks right.)

I tried including LOAD_FLAGS_BYPASS_CACHE and LOAD_FLAGS_CHARSET_CHANGE in the flags but that didn't seem to have any effect. I also checked that auto-detect was turned off, so that wasn't the problem either. Adding <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> to the document seems to have solved the problem, but I would expect that using the 'charset' argument of loadURIWithFlags should work just as well, so I'm wondering if I did something wrong in my initial attempt.

+1  A: 

You did the right thing and the only solution is to include encoding information inside the document because if you rely only on HTTP headers you will fail to load the document when the document is saved on disk (because there is no such thing as headers for files).

If you are the one saving the file you could add the UTF-8 BOM to the file in order to assure that it will be properly loaded by Firefox or other applications.

Sorin Sbarnea
Good idea! I guess I didn't realize that there was such a thing as a UTF-8 BOM -- I thought it was only for encodings like UTF-16 that always use at least two bytes per character. Wikipedia to the rescue! (I've added a link to your answer.)
MatrixFrog