Technical choices in unmarshalling hash-consed data

views:

102

answers:

+5 Q:

Technical choices in unmarshalling hash-consed data

There seems to be quite a bit of folklore knowledge floating about in restricted circles about the pitfalls of hash-consing combined with marshalling-unmarshalling of data. I am looking for citable references to these tidbits.

For instance, someone once pointed me to library aterm and mentioned that the authors had clearly thought about this and that the representation on disk was bottom-up (children of a node come before the node itself in the data stream). This is indeed the right way to do things when you need to re-share each node (with a possible identical node already in memory). This re-sharing pass needs to be done bottom-up, so the unmarshalling itself might as well be, too, so that it's possible to do everything in a single pass.

I am in the process of describing difficulties encountered in our own context, and the solutions we found. I would appreciate any citable reference to the kind of aforementioned folklore knowledge. Some people obviously have encountered the problems before (the aterm library is only one example). But I didn't find anything in writing. Even the little piece of information I have about aterm is hear-say. I am not worried it's not reliable (you can't make this up), but "personal communication" and "look how it's done in the source code" are considered poor form in citations.

I have enough references on hash-consing alone. I am only interested in references where it interferes with other aspects of programming, such as marshalling or distribution.

+2 A:

I found one reference on marshalling in functional languages; not sure if it will be useful, but the authors are smart: http://tinyurl.com/yc3hob9

I believe that Matthias Blume and/or Andrew Appel did something on this, but I can't find the paper. I also believe I reviewed something once for the Journal of Functional Programming, but I can't remember if the paper was accepted or who wrote it.

I suggest you ask Matthias Blume, Andrew Appel, and Phil Wadler if they can help.

Norman Ramsey 2010-02-27 05:13:18

+2 A:

OK, this is not much more use, but Andrew Kennedy wrote a functional pearl called simply Pickling Combinators, which appears in the Journal of Functional Programming, (2004), 14:6:727-739. There is extensive discussion of structure sharing and how it is handled in pickles, but no direct discussion of how this problem might relate to hash-consing in the implementation of the language. But the article does discuss structure sharing in memory as well as in a pickle, so I hope it is better than nothing.

Martin Elsman had a follow-on paper in 2005 in Trends in Functional Programming; the title is Type-specialized serialization with sharing. The article deals primarily with hash-consing by the unpickler (deserializer), not with hash-consing in the impelementation, but again it may be worth something.

The JFP paper is proprietary, but there appears to be a preprint on Andrew's web page. Elsman's paper appears to be available through Google Scholar at http://tinyurl.com/yd5tw2b.

(In a previous life, I worked on a project to create ASCII pickles that people could read and edit. I stupidly failed to publish it, but I have retained an interest.)

Norman Ramsey 2010-03-04 01:05:34

I really like Andrew Kennedy's article, and even though it does not mention of hash-consing, it is the best background against which to explain our own solution.

Pascal Cuoq 2010-03-04 07:07:22

Coq V5.10 had hash-consing and marshaling/unmarshaling. I didn't find anything in published form but the unmarshaling steps would be referenced as "reinterning" in the source code. Coq unmarhsaled values and then traversed them in order to re-create sharing, the obvious and only solution when all the language provides is an unmarshal function of type int_channel -> 'a.

Pascal Cuoq 2010-03-20 21:28:22

ansaurus

tags:

views:

answers:

Technical choices in unmarshalling hash-consed data

related questions