tags:

views:

71

answers:

2

I am parsing a weather data feed and it works with certain locations but errors out with this message on some locations:

09-22 10:40:33.364: WARN/System.err(3347): org.apache.harmony.xml.ExpatParser$ParseException: At line 465, column 29: not well-formed (invalid token)

Any ideas what might be happening?

Here is a snippet of the xml:

                <hour time="11 AM">
                    <url>http://www.....&lt;/url&gt;
                    <obsdate>9/22/2010</obsdate>
                    <txtshort>Parcialmente soleado</txtshort>
                    <weathericon>03</weathericon>
                    <temperature>26</temperature>
                    <feelslike>29</feelslike>
                </hour>

                <hour time="12 PM">
                    <url>http://www.....&lt;/url&gt;
                    <obsdate>9/22/2010</obsdate>
                    <txtshort>Parcialmente soleado</txtshort>
                    <weathericon>03</weathericon>
                    <temperature>26</temperature>
                    <feelslike>29</feelslike>
                </hour>

Line 465 is the 'hour' tag with the 12pm attribute value. I have logged parse code and it is reading the xml up until it reaches this line.

+1  A: 

The error says it's occurring in column 29, and the line that you've said is the line containing the error is only 18 characters long. In all likelihood, this means one of two things: either that line contains non-printing characters that we can't see, one of which is one of the small handful of characters that aren't allowable in XML, or there's an off-by-one error somewhere and the error's occurring in the next line - probably in the URL that you've redacted.

Robert Rossney
That's not right, before my edit (for readability) the line was 36 characters long, the 29th character is the 1.
Bobby
David Dorward
A: 

Before attempting to read any xml file, it's always advisable to check for well-formedness of the xml document you are attempting to read. In this case, try to put a well-formedness condition around the xml feed you get from the weather data feed before parsing it. Using C#.Net this can be done as follows:-

XmlDocument doc = new XmlDocument();
doc.loadxml(rawXMLcontent);

If this fails it goes to exception block. You can handle the feed in the exception block accordingly. This assures that you never get any parse exceptions. I hope it helps.

A_Var
I think if you look at the error closely, you'll see that it's being thrown by the XML parser - in short, OP is already performing exactly the well-formedness check that you're suggesting.
Robert Rossney
@Robert Yeah I do see that the error is related to the attribute value that easily by-passes the well-formedness condition.
A_Var
Easily bypasses? The error message *says* that the document's not well-formed.
Robert Rossney
The message says the token is not well-formed but not the document itself. There is huge difference between token and document I think. Checkinf for wellformedness does check for document but not for the attribute values or CDATA values. I think the problem over here is with the values and nothing to do with XML syntax which is quite ok.
A_Var