views:

363

answers:

5

I've tried to read http://www.w3.org/TR/xml-infoset/ and the wikipedia entry. But frankly I'm still not sure what the difference is.

The quote :

An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.

From the wikipedia entry seems to not make sense. How can a non valid document have any semantics, and thus how can it be an 'information' set?

What is this 'infoset' that

well-formed and satisfies the namespace constrained

XML has? And in what way it is useful in itself. In other words why is it, semantically speaking, necessary to define the XML infoset? Is there any information that cannot be represented in XML? If so I can see the limiting set of the XML Infoset, but if not surely the XML Infoset is as meaningless as term 'information'?

Thank you for the interesting answers: I still cannot grasp why the Xml infoset has any purpose as opposed to the term infoset. But you guys have given me the direct answer to the question.

+1  A: 

A valid XML document fulfills the requirements of a DTD or XSD (or other standards). If it is well-formed, it still can be 'invalid', if it violates the rules in the given DTD or XSD.

Edit: I am new to this area of XML, but it looks like the infoset is the 'abstract level' description of the parts of a XML document, independent of the actual technical implementation - which could be, for example, a Document Object Model implementation.

mjustin
but what makes it an infoset as opposed to a vanilla xml document?
Preet Sangha
+3  A: 

XML is not text. XML "is" the XML infoset. This may then be serialized into text in an XML document, but it is the XML infoset that is the reality.

The infoset may exist in memory as a DOM tree, for instance. It exists in memory as them implementation of an abstract object model.

What if I serialized it as UTF-8 and then as UTF-16. Chances are the results would be two different sets of bits, but same infoset.

Consider also that with text it makes sense to do things like string concatenation. You don't want to concatenate a "<" into the middle of an XML element. You have to encode it first. Why would you have to do this if it were just text? If you used the DOM, for instance, you'd just say element.InnerText = "<"; When serialized, the "<" would be encoded into "&lt;". Yet it's the same infoset.

John Saunders
I cannot visualise this paradigm - in what way is XML not text. I'm not being facaetious but how does xml 'exist' without being represented with angle brackets?
Preet Sangha
thank you. I appreciate the example. I did originally see the encoding aspect and the 'same information' aspect - but is this all an infoset is? What makes the XML Infoset distinct from any information definition?
Preet Sangha
+1 for examining the model independent of its bits. See also http://en.wikipedia.org/wiki/Theory_of_Forms
David Schmitt
@Preet Sangha: The infoset is the abstract data. XML is just one way of representing that data. The data could be represented completely in a completely different way, one that does not even look like pointy brackets in a text file, still it would be the same data. It is a common mistake to think that XML actually *is* data it represents. It is merely the serialized form.
Tomalak
@tomalak. In which case this is an infoset. What makes its the XML infoset then?
Preet Sangha
It's an XML infoset because it's an infoset represented in XML.
Joren
+1  A: 

An XML infoset is an abstract set of concepts such as attributes and entities that can be used to describe a valid XML document. According to the specification, "An XML document's information set consists of a number of information items; the information set for any well-formed XML document will contain at least a document information item and several others."

Just because an XML document is an infoset does not mean it conforms to an XSD and is a valid XML document.

SteveChadbourne
Thank you. So what you're saying is that by describing something with attributes and entities - i.e. things and things about things makes it an xml infoset? I refer you to original questions - then why even bother to define such a thing? What needs it?
Preet Sangha
It allows the other XML standards to be described in terms of this abstract model instead of in terms of their effect on some concrete implementation. Consider the fact that there may be many concrete implementations, and the benefit becomes much more clear. You would have to describe XSLT multiple times to account for the separate implementations instead of describing it once, in terms of the infoset.
John Saunders
A: 

A good example I've just come across is in David Chappell's WCF PDF. This is how it works when using TCP for example:

To allow optimal performance when both parties in a communication are built on WCF, the wire encoding used in this case is an optimized binary version of SOAP. Messages still conform to the data structure of a SOAP message, referred to as its Infoset, but their encoding uses a binary representation of that Infoset rather than the standard angle-brackets-and-text format of XML. Using this option would make sense for communicating with the call center client application, since it’s also built on WCF, and performance is a paramount concern.

RichardOD
Cheers Rich, this actually where my question originated. I cannot see what distinguished the XML Imfoset from the general case of the Infoset in the case of a thing with attributes. Actually I feel stupid in that I'm the only person who cannot seem to see why the XMK in XML infoset matters.
Preet Sangha
A: 

A useful way of thinking of the distinction between XML text and the XML infoset is to consider the Fast Infoset. This is a binary representation of the XML infoset.

So you have the an abstract "infoset" which is a conceptual model representing XML data (nodes, elements, attributes, etc). This can be physically represented as a text XML document, or as a Fast Infoset stream. Both represent the same data, but in radically different ways.

skaffman
Thank you, but I still have the the problem in comprehending what makes the XML info set different from the general case of an info set. I'll take a look at that ans see.
Preet Sangha
I'll try and be more clear. Is it the case that XML => elements and attributes? In that case it makes sense however I originally perceived concept of XML as a specialisation of the general case of the infoset (ie. describing information). Now it seems to be the case the XML is the generalisation of that concept in which case the XML infoset is THE infoset. Hence my inability to comprehend the semantics.
Preet Sangha