views:

449

answers:

2

I have implemented a SAX parser in Java by extending the default handler. The XML has a ñ in its content. When it hits this character it breaks. I print out the char array in the character method and it simply ends with the character before the ñ. The parser seems to stop after this as no other methods are called even though there is still much more content. ie the endElement method is never called again. Has anyone run into this problem before or have any suggestion on how to deal with it?

+3  A: 

What's the encoding on the file? Make sure the file's encoding decloration matches it. Your parser may be defaulting to ascii or ISO-8859-1. You can set the encoding like so

<?xml version="1.0" encoding="UTF-8"?>

UTF-8 will cover that character, just make sure that's what the file actually is in.

sblundy
+2  A: 

If you are saving your XMLs in ASCII, you can only use the lower half (first 128 characters) of the 8-bit character table. To include accented, or other non-english characters in your XML, you will either have to save your XML in UTF-8 or escape your charaters like &#241; for ñ.

DrJokepu