views:

101

answers:

1

I am making an add-on for firefox and it loads a html page using ajax (add-on has it's XUL panel).

Now at this point, i did not search for a ways of creating a document object and placing the ajax request contents into it and then using xPath to find what i need.
Instead i am loading the contents and parsing it as text with regular expresion.

But i got a question. Which would be better to use, xPath or regular expression? Which is faster to perform?

The HTML page would consist of hundreds of elements which contain same text, and what i basically want to do is count how many elements are there.

I want my add-on to work as fast as possible and i do not know the mechanics behind regexp or xPath, so i don't know which is more effective.

Hope i was clear. Thanks

+11  A: 

Whenever you are dealing with XML, use XPath (or XSLT, or DOM methods, or any other XML method to go through your data). Do never use regular expressions for this task.

Why? XML processing is intricate and dealing with all its oddities, external/parsed/unparsed entities, DTD's, processing instructions, whitespace handling, collapsing, unicode normalization, CDATA sections etc makes it very hard to create a reliable regex-way of getting your data. Just consider that it has taken the industry years to learn how to best parse XML, should be enough reason not to try to do this by yourself.

Answering your q.: when it comes to speed (which should not be your primary concern here), it highly depends on the implementation of either the XPath or Regex compiler / processor. Sometimes, XPath will be faster (i.e., when using keys, if possible, or compiled XSLT), other times, regexes will be faster (if you can use a precompiled regex and your query is easy). But regexes are never easy with HTML/XML simply because of the matching nested parentheses (tags) problem, which cannot be reliably solved with regexes alone.

If input is huge, regex will tend to be faster, unless the XPath implementation can do streaming processing (which I believe is not the method inside Firefox).

"which is more effective" >> the one that brings you quickest to a reliable and stable implementation that's comparatively speedy. Use XPath. It's what's used inside Firefox as well when it needs to deal with XML.

Abel
Thanks for reply. Now i have another newbie question. Would you happen to know how to create a new HTML or XML document object inside the Firefox add-on's XUL? As document.evaluate work only with XML and HTML and NOT XUL. I need to somehow put the AJAX response text to DOM document to be able to use xPath on it. I have spent 40 mins searching for this but still failed to find. I know i could load the contents into a new tab and acces it there, but that is not what i want to do. Thanks. (not sure if i had to create a new question instead of asking in comment here)
@aleluja: You should ask again for your new question.
Alejandro