views:

67

answers:

3

Hi,

I'm building a conforming and validating XML parser in C++ and trying to make it light-weight for use in pocket pc.

At the beginning I decided to add some "events" to my parser like SAX does, informing about elements, processing instructions, etc.

This events are taken by a derived class that builds the DOM tree of the xml.

My doubts appears when trying to handle mainly entities (which can contain elements, pi's and comments inside if defined) and their resolution.

For e.g., I can create a XMLEntityRef class that refers to some XMLEntity defined in some XMLDocType object like .NET system.xml parser does.

As I know, for most purposes an application needs to know an element, its contents, its respective attributes and their respective values... only strings... it doesn't care if the element content is formed by cdata objects, entity references and/or plain text... the same applies to attribute values.

So, my question is the following: What is the benefit of passing to an application each xml object as it appears and letting it (or a helper class) to build, for e.g., the resulting attribute's value by concatenating texts and resolved entity references?

If i'm making a poll, please answer: does your application need to know about cdata tags and where they are located in the xml file, or you make things easy... you want to know the full content value of an element in a string without worrying about how it is builded?

Best regards, Mauro H. Leggieri

+1  A: 

generally xml is not light weight. You are better off with JSON.

prime_number
yes, because the json libraries for C/C++ are "so good"... In web developing I 10000% agree with you. On other subjects? mmmm, not so much.
elcuco
Who said anything about libraries? He wants to right his own anyways. Besides data is data. Whether web development or "other subjects". XML and JSON are just a way of formally defining the data you are transferring.
prime_number
+1  A: 

When building a parser I do not think you should presume anything about how applications will consume the xml, rather, provide the most granular level of data for each xml node to provide maximum flexibility. While this may require more work on the part of consuming applications, they will be able to accomplish whatever they need to. Good luck.

bill seacham
Thank you, Bill.
Calamardo
A: 

I'm building a conforming and validating XML parser in C++ and trying to make it light-weight

There is no such thing as a light-weight conforming (never mind validating) parser. To be a conforming parser you have to understand all the stuff that can go in a DTD external subset, which is gnarly work indeed. It is a shame that the XML specification ended up weighed down with all the SGML DTD crud, but we are stuck with it now.

does your application need to know about cdata tags and where they are located in the xml file

Normally no. DOM Level 3 LS does require that CDATA sections be kept a CDATASection nodes in the DOM by default, but almost no application cares.

(If the question is about my application then yes, because my application is a templating system that keeps CDATA sections where they were. But still.)

My doubts appears when trying to handle mainly entities

God yes. Entity references are a total disaster. Making a DOM implementation support them in a way which is compliant with DOM Level 3 Core/LS is very very complicated. Avoid if at all possible.

bobince
Well... with light-weight I was trying to visualize the difference of code in some implementations. Xerces code (excluding io stuff, etc.) is bigger that mono.net implementation of system.xml.But, like you say, one can make a fully dom level 3 compliant parser. Well... which kind of app really takes advantage about a dom l3 structure excluding an xml editor?
Calamardo
Very few, but unfortunately you have to implement most of the hairy DTD stuff to get even a minimally-conforming XML parser. It's a shame XML dragged so much of the useless DTD nonsense out of SGML; I would love there to be a standard simplified subset of XML for lightweight work, dropping all the doctype stuff (along with default attributes and all entities) and allowing namespace declarations only on the root element.
bobince
yes... it should exists a lightweight specification... thanks
Calamardo