views:

593

answers:

2

Hi,

I created this static website in which each page has the following structure:

  1. Common stuff like header, menu, etc.
  2. Page specific stuff in main content div
  3. Footer

In the website linked above all the common stuff was duplicated in each page. In order to improve the maintainability I refactored the pages to use server-side includes (SSI) so that the common parts are not duplicated. The structure of each page is now

  1. SSI for Common stuff like header, menu, etc.
  2. Page specific stuff in main content div
  3. SSI for footer

I uploaded the refactored site to this address, and as you can see it didn't quite work out. For some reason the French characters no longer display properly in the page-specific content area, though they display fine in the content included via SSIs.

The included header specifies the character set as:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

If I open one of the main content pages in a browser it tells me that the character encoding is ISO-8859-1. I've tried adding a .htaccess file to the folder with the lines

AddDefaultCharset UTF-8
AddCharset UTF-8 .shtml
AddCharset UTF-8 .html

But still those pesky French accents aren't displaying properly on the version of the site that uses SSIs.

Cheers, Don

A: 

Your HTML document is using UTF-8 encoding, try these character codes for your accented letters: http://www.tony-franks.co.uk/UTF-8.htm

John Rasch
But why does this only happen when using SSIs? I'm using UTF-8 in the non-SSI version and the accented letters display fine.
Don
Have you tried adding "AddCharset UTF-8 .shtml" to your http.conf file? I don't know if this will work or not but it's worth a try (assuming you're including .shtml files).
John Rasch
+2  A: 

You are serving your pages as UTF-8, which is good, but at least some of the page is being dragged in from files which are not actually saved as UTF-8. SSI just throws the raw bytes in, it doesn't attempt to recode the includes so that their charsets match the file they're being included into.

You need to go through all your html and include files in a text editor and make sure each one is saved as UTF-8.

As John mentioned, you can avoid encoding issues by using character references for all non-ASCII characters, but it's a tremendous pain.

bobince
Thanks for the suggestion. In Eclipse (the editor I use), I changed the file encoding of all files to UTF8, but the result is still the same. Is there a way I can check whether Eclipse did actually change the encoding correctly?
Don
Try loading the files (even just as text) into a web browser, setting View->Character Encoding to ‘UTF-8’ and seeing if the accents display correctly. Even Notepad can do it, at a pinch, so I'd be surprised if Eclipse couldn't!
bobince
Is it the text in the ‘test/index.html’ file that comes out wrong, or in the includes? Have you tried dropping a ‘.htaccess’ file in the folder, containing the line ‘AddDefaultCharset UTF-8’? Currently it is served as plain ‘text/html’ — not that it matters with the <meta> in place, but still.
bobince
The accents in the *included* files display correctly. It's the accents in the files that do the including that don't work. I tried adding a .htaccess but still no joy
Don