views:

42

answers:

0

JSON and XML are both frequently called to be context-free languages - they are both specified mainly by a formal grammar in EBNF. However this is only true for JSON as defined in RFC 4329, section 2.2 which does not require uniqueness of object keys (many may not know but {"a":1,"a":2} is valid JSON!). But if you require unique keys in JSON or unique attribute names in XML this cannot be expressed by a context-free grammars. But which is the language class of JSON with unique keys and for well-formed XML (which implies unique attribute names?).

One of the best paper I found on this subject (Murato et al, 2001: Taxonomy of XML Schema Languages using Formal Language Theory) explicitly excludes integrity constraints such as keys/keyrefs and uniqueness to be checked on an additional layer. Beside this the subset of XML defined by an XML Schema or by a DTD is context-free. But not the full set of all well-formed XML documents.

I think a nested stack automaton (=indexed language) should be able to parse JSON with unique key constraint. For XML can simlify the question to the language S of all comma-seperated lists of unique integers. Does anyone know more, preferably with citations?

P.S: A simple algorithm to decide the languages (beside the context-free part) is based on a good sorting algorithm. Therefore it should be decidable in "linearithmic time" with O(n log n) worst case. I have not found out yet, whether the complexity class is for instance "mildly context-sensitive", or "indexed" but probably something between context-free and context-sensitive (?).