views:

2658

answers:

6

Hi,

I have a problem with character encoding in some HTML pages. It seems that the cause of the problem is that some of the .html files are not saved as UTF-8 encoded files. Even though I have instructed Eclipse to save these files as UTF-8, when I open them in a browser, it indicates that the files are ISO-8859-1.

How can I change the encoding of these files to UTF-8?

UPDATE: I already have the following included in the section of each webpage

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

I am using the Apache web server.

Thanks, Donal

+2  A: 

The problem with UTF-8 is that there is no magic byte sequence at the beginning of these files - the browser's only chance to detect UTF-8 is either by the XML declaration, HTML meta tags, or some heuristics as fallback.

Make sure that there is either an XML encoding declaration or some HTML meta tags in the header of the HTML.

<?xml version="1.0" encoding="utf-8"?>

just below DOCTYPE if it's XHTML, or

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

in the head section.

DrJokepu
+1  A: 

You can use iconv to convert files from one character encoding to another.

Adam Rosenfield
+1  A: 

Try adding

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

to the head section of your html files, or ensure that your server is serving the files with a Content-Type http header. Without either of these, the browser can only guess at the character encoding.

Aaron Novstrup
+1  A: 

You may need to change the content type header that your web server sends the client.

Edit: While this did work for this particular situation, using a tool to change the file encoding as suggested by other posters may be a better solution in other situations. YMMV.


Instructions for saving as UTF-8 in Eclipse (which I realize you already have):

You should probably change the Default Encoding in your workspace for the HTML document.

This is for Eclipse 3.4. If you have a different version, this may be slightly different.

Goto Window->Preferences
In the Preferences window goto General->Content Types
At this point, you can specify a 'Default Encoding' for files near the bottom of the preferences window. Expand 'Text' and select HTML. In the 'Default Encoding' entry, put UTF-8. Then click 'update' at the right.

After this, all HTML files should be saved in UTF-8 format.

Akrikos
A: 

As far as I know, setting the character encoding in Eclipse does not actually convert the files -- it just tells Eclipse how you want them interpreted. Your best bet is to use a converter tool such as the one Adam suggested.

Aaron Novstrup
A: 

hi I have a problem of MS word. when i try to open the file it was asking about the "select the encoding that makes your document readable. Now my file is like this format " Èí@#ËÌø " . so how can i read the file at correct format. Please give me a solution regarding to this problem.

shaik