views:

66

answers:

2

I need to clean up some file containing French text. Problem is that the files erroneously contain multiple encodings within the same file.

I think some sections are ISO8859-1 (Latin 1) but other parts have text encoded in single byte characters that look like 'extended' ASCII. In other words, it is UTF-7 encoding plus the following:

  • 0x82 for é (e acute)
  • 0x8a for è (e grave)
  • 0x88 for ê (e circumflex)
  • 0x85 for à (a grave)
  • 0x87 for ç (c cedilla)

What encoding is this?

A: 

This website here shows a link with 0x87 for cedilla. I haven't look much further than this, but I bet the rest of your information could be found here as well.

Michael Dorgan
That's capital-C-cedilla, and only mentions 0x87 as the second byte of a UTF-8 sequence by coincidence.
bobince
+3  A: 

That's the original IBM PC encoding, Code page 437.

Michael Borgwardt
Yep. Don't see it much these days!
bobince
Thanks Michael.
Canoehead