tags:

views:

72

answers:

3

Hello everyone.

I am in a kind of an weird condition in my code. I am writing an Apache module that needs to add a comment in the head tag of the response document (apart from doing some other unimportant stuff).

At the point where I need to parse the response document, I have the whole document in memory in the form of a char * buffer (I am using C). So I am not quite sure which API to choose?

DOM as I know, will create its own in memory tree representation of the document. I can save some memory space by freeing up the earlier buffer.

SAX: I really don't understand it quite well.

XPath- as I have searched around, I believe it can only be used to retrieve element values . If this is true, then its none of my use.

Give me some insights, as what is best suited to the current situation ?

+1  A: 

DOM and SAX are "ways to parse the data". DOM parses the entire document and generates a data structure. SAX parses the document "element by element", letting you know when it encounters something interesting and expecting you to deal with it.

XPath is a way to reference data in a DOM document once you have it. Ie, to say "the first node", etc. It's very powerful and wonderful, but not used for parsing.

As far as ease of use, DOM is far superior. However, it's much slower in many cases and takes up a lot more memory.

For me, the things I consider are based around whether the slowness and memory bloat of using DOM would impact my application:

  • Am I parsing very large document(s)?
  • Am I parsing many, many things?
  • Does speed actually matter?

Also worth noting is that, should you choose to use DOM, make sure you research what libraries are out there. A bad library can be 10x to 100x slower than a good one.

RHSeeger
+1  A: 

Regarding the DOM vs SAX, remember that DOM adds latency to your processing.

DOM is easier, as it will create a structure automatically. In this structure, you'll add the data you want, and then you'll be able to generate the char* buffer from the structure with DOM. But you hav to realize that you need to fully create the structure before you can add your data, and only then you can convert it back to char* to send it. This is where the latency is added.

Using SAX is more work. You work on the XML as it arrives. You don't even have to wait for the full char* data to be present to start working on it. You detect where you are in the document as soon as an element starts, and you are able to inject your additional data on-the-fly. There is very little latency added, and no data duplication.

I don't know much about XPath, but it's useless for parsing.

Didier Trosset
+1  A: 

In terms of working with XML (or HTML) and Apache, if you're doing simple enough things such as inserting a comment to a particular place in the document, it will probably be more efficient to work with XSL. This natively deals with XML-style documents, of which HTML and XHTML are a subset, without needing to convert them into some other format to work more easily with other programming languages. DOM and SAX parsing, on the other hand, each consider the XML doc in a fashion that is easier to deal with, either by converting it to a native object in your particular language, or registering "events" that your code can handle, respectively.

For a little bit more about XSL, take a look at http://www.w3schools.com/xsl/.

An additional thought - if you really are doing something basic like adding a comment to the head, it would be more efficient to use SAX parsing than DOM parsing, as a simply edit shouldn't require parsing the entire document. It would rather be more elegantly handled by waiting for the proper event of reaching the "head" element and then adding whatever you desired to it.

nearlymonolith
thanks! I will try and look for using SAX, libxml2 gives a great deal of code examples for DOM but none for SAX, so it makes using SAX a bit difficult!
Abhinav Upadhyay