ansaurus

Question

Answer 1

A:

<?xml version="1.0" encoding="utf-8"?>

should be fine for utf-8.

Chouchenos 2010-10-19 09:54:58

+1 Why the downvote? This is correct.

sleske 2010-10-19 10:11:13

Answer 2

+3 A:

If all fails, read the spec :-).

4.3.3 Character Encoding in Entities

Each external parsed entity in an XML document may use a different encoding for its characters.

[...]

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values " ISO-8859-1 ", " ISO-8859-2 ", ... " ISO-8859- n " (where n is the part number) SHOULD be used for the parts of ISO 8859, and the values " ISO-2022-JP ", " Shift_JIS ", and " EUC-JP " SHOULD be used for the various encoded forms of JIS X-0208-1997.

It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority IANA-CHARSETS, other than those just listed, be referred to using their registered names; other encodings SHOULD use names starting with an "x-" prefix.

Source: http://www.w3.org/TR/REC-xml/

So UTF-8 is written as encoding="UTF-8".

For other character sets not listed above, use the names given in the IANA character set list.

Case of the letters in the character set name is not significant: "However, no distinction is made between use of upper and lower case letters." (IANA character set list). So you could also write encoding="uTf-8" if you feel like it ;-).

BTW: Are you really, really certain you want to write your own XML parser? This sounds suspiciously like reinventing the wheel.

sleske 2010-10-19 09:55:18

+1 for 'read the spec', -1 for 'if all fails' (it should be the first port of call when writing a parser, not the last) and +1 again for 'reinventing the wheel' ;)

David Dorward 2010-10-19 09:57:40

@David Dorward Thanks :-). To be honest, in generally I would not recommend the spec as first port of call to a beginner, many specs can be rather daunting. But the spec is the place to go if you can't find the answer in a tutorial (or if you want to be certain what is right). Anyway, you probably noted the smiley next to "if all fails".

sleske 2010-10-19 10:01:04

The smiley is next to *read the spec* :) Seriously though, the question suggests the goal is to write a general parser, so it needs to cover everything that it might be parsing, and that really *really* needs the spec as it lays out the requirements in technical terms. I'd be very surprised if anybody wrote documentation that provided enough information to write a parser that was aimed at beginners.

David Dorward 2010-10-19 10:33:38

As *sleske* said, it all goes to the IANA list: http://www.iana.org/assignments/character-sets Thanks a lot! I've been stupid not to find this in the spec. Yes, I need my own parser for some embarrassing reasons. Thanks, again!

Albus Dumbledore 2010-10-19 10:39:28

ansaurus

tags:

views:

answers:

Setting encoding in XML files

related questions