views:

192

answers:

3

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

http://rt.com/Root.rss

...as well as many other RSS I've tried. I checked cnn/bbc feeds, they don't have newlines and dom parser handling them nicely. So I have to add the following before parsing it

var xmlText = htmlText.replace(/\n[ ]*/g, "");
var xmlDoc = parser.parseFromString(xmlText, "text/xml");

Server is returning text/xml.

var channel = xmlDoc.documentElement.childNodes[0];

this returning \n without my code above and channel with correction.

A: 

What is your question? Do you wish to not use the workaround? I think the workaround is necessary as the parser is working as expected.

Delan Azabani
My thoughts are the parser is not working as expected and the workaround is somewhat artificial. The parser should not put `\n` elements, so I though maybe I'm misusing the parser functionality. Really want to avoid such workarounds.
Michael
+1  A: 

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

that is standard behaviour. only IE ignores whithespace between Element Nodes. (XML Whitespace Handling, Whitespace @ MSDN, Whitespace @ MDC)

Dormilich
A: 

Yes, that's what XML parsers are supposed to do by default. Get used to walking through child nodes checking to see whether they're elements (nodeType===1) or text nodes (3).

From Firefox 3.5 you get the Element Traversal API, giving you properties like firstElementChild and nextElementSibling. This makes walking over the DOM whilst ignoring whitespace easier. Alternatively you could use XPath (doc.evaluate) to find the elements you want.

If you want to remove whitespace nodes for good, it's a much better idea to do it on the parsed DOM than by using a regex hack:

function removeWhitespace(node) {
    for (var i= node.childNodes.length; i-->0;) {
        var child= node.childNodes[i];
        if (child.nodeType===3 && child.data.match(/^\s*$/))
            node.removeChild(child);
        if (child.nodeType===1)
            removeWhitespace(child);
    }
}
bobince
Firefox also has the .children property, which is a collection of all element children.
Dormilich