views:

228

answers:

2

Can I include characters such as "ã" and "ê" in UTF-8 encoded XML, or must it be UTF-16 encoded?

+6  A: 

You can encode those characters in UTF-8.

The key is to keep the prolog (<?xml version="1.0" encoding="utf-8" ?>) and the actual file encoding consistent.

The whole point of UTF-8 is to be able to encode all the Unicode characters in a smaller footprint. According to the source of all wisdom, wikipedia, utf-8 encodes each character point in 1 to 4 bytes, and is backwards compatible with ASCII

John Weldon
The "smaller footprint" doesn't apply to all characters: U+0800 to U+FFFF are larger in UTF-8 than in UTF-16.
dan04
+5  A: 

All Unicode Transformation Format encodings can encode any character found in Unicode. The characters given are found in the Unicode standard.

Ignacio Vazquez-Abrams