ansaurus

Question

Lisp is for List Processing. Is there a language for Tree Processing?

Answer 1

+5 A:

I don't see that the change would be very profound. Lisp certainly doesn't have any problem with letting lists be members of other lists, so it can easily represent trees, and algorithms on trees.

Conversely, every list can be regarded as a tree of a particular share (in various ways).

reinierpost 2010-09-17 22:20:07

The profundity of a language that uses the tree as its base data structure is _effective parallelism_. Of course, once you put lists inside your lists, you've got a tree. You could also go the other way and say that Lisp is really about "cons cell processing"...

Treep 2010-09-17 22:24:52

I agree ... but I guess "Consp" or "Pairp" isn't as nice a name.

reinierpost 2010-09-17 22:27:49

By the way: do have a look at PostScript. It is stack-based (like Forth) and has dictionaries as a basic data structure.

reinierpost 2010-09-17 22:28:46

Another interesting language is [Fortress](http://projectfortress.sun.com/Projects/Community), and its "implicit parallelism"...

Treep 2010-09-17 23:00:22

Answer 2

A:

I would say that the major data structure of Lisp languages is the cons cell. One of the things you can easily build with cons cells is a linked list, but that's by no means the only data structure.

A cons cell is a pair of data items, but there's nothing that says a value has to be in the left cell and a pointer in the right cell (as in a linked list). If you allow both cells to contain either values or pointers themselves, it's easy to build binary (or with a bit more work, n-ary) tree structures. Building upon these structures, one can build dictionaries or B-trees or any other data structure you might think of.

Greg Hewgill 2010-09-17 22:29:24

Indeed, as I said in a comment to reinierpost, Lisp is really about "cons cell processing".

Treep 2010-09-17 22:40:03

Answer 3

+4 A:

Lisp Lists ARE trees, and Lisp code is a tree, just like any other code.

(+ (* 1 3) (* 4 6))

is a tree:

     +
    / \
   /   \
   *   *
  / \ / \
  1 3 4 6

And it's not just binary trees.

(+ 1 2 3)

   +
  /|\
 / | \
1  2  3

So, perhaps Lisp is your answer as well as your question.

Will Hartung 2010-09-17 22:43:52

...and if you go the other way, you can say that Lisp lists are really cons cells; however, my question is more about the design of a language that allows for more idiomatic (parallel) processing of trees. For example, `(+ (* 1 3) (* 4 6))` could evaluate `(* 1 3)` and `(* 4 6)` in parallel, and language-level constructs would provide even more effective parallelism.

Treep 2010-09-17 22:56:25

@Treep: executing code in parallel is a different thing from processing tree data in parallel.

Rainer Joswig 2010-09-18 04:09:15

Answer 4

A:

The canonical example of a homoiconic tree processing language is XSLT. But you probably don't want to write (nor read) anything substantial in it.

Jörg W Mittag 2010-09-17 23:48:45

Answer 5

+1 A:

OP appears to be interested in langauges that process trees and have some parallelism support.

Our DMS Software Reengineering Toolkit is a general-purpose program analysis and transformation engine. It parses programmming language source text into trees (OP's first interest), enables analysis and processing of those trees, and regeneration of source text from those trees. DMS is driven by explicit grammars that describe the language being processed, and it has a lot of production-quality langauge definitions available for it.

DMS's tree processing machinery provide support for parallelism at two different levels, one supported directly by DMS's underlying parallel programming langauge, PARLANSE, which is inspired by LISP but isn't nearly so dynamic.

PARLANSE provides for "teams" of parallel activities called "grains" (as in sand on a beach, the idea is that there are lots of grains). Teams may be structured dynamically, by (classically) forking new grains and adding them to a (dynamic) team; this is useful but not spectacularly fast. Teams may be structured statically, including "fork this fixed size set of grains as pure parallel":

(|| a b c)

and "create a set of grains whose execution order is controlled by a specified partial order" (!| ... ). You write the partial order directly in the fork call:

(!| step1 a
    step2 b
    step3 (>> step1 step2) c
    step4 (>> step2) d )

which encodes the fact that action c occurs after (later >> in time )) actions a and b complete. The statically formed teams are precompiled by the PARLANSE compiler into pretty efficient structures in that the grains can be launched and killed very quickly, allowing pretty small grain size (a few hundred instructions). Standard parallelism libraries have lots higher overhead than this.

The basic method for processing trees is pretty conventional: there's a PARLANSE library of facilities for inspecting tree nodes, walking up and down the tree, creating new nodes and splicing them in place. Recursive procedures are often used to visit/update a tree.

The key observation here is that a tree visit which visits some set of children can be coded sequentially, or easily coded as a static parallel team. So it is pretty easy to manually code parallel tree visits, at the cost of writing lots of special cases to parallel-fork for each tree node type. There's the "divide and conquer" that seems of interest to the OP. (You can of course enumerate a nodes' childern, and using a dynamic team to fork grains for each child, but this isn't as fast). This is the first level of parallelism used to process trees in DMS.

The second level of parallelism comes through a DMS-supplied DSL that implements an attribute grammar (AG)s.

AGs are functional languages that decorate the BNF with a set of computations. One can write a simple calculator with an AG in DMS:

   sum = sum + product;
        <<Calculator>>:  sum[0].result = sum[1].result + product.result;

This causes an attribute, "result" to be computed for the root (sum[0]) by combining the result attribute from the first child (sum[1]) and the second child (product.result). So-called synthesized attributes propagates information up the tree from the leaves; inherited attributes propagate information down from parents. Attribute grammars in general and DMS's AG allow mixing these, so information can flow up and down the tree in arbitrary order.

Most AGs are purely functional, e.g., no side effects; DMS's allows side effects which complicates the notation but is pretty useful in practice. A nice example is construction of symbol tables by attribute evaluation; current-scope is passed down the tree, local-declaration blocks create new current scopes and pass them down, and individual declarations store symbol table data into the symbol table entry received from a parent.

DMS's AG compiler analyzes the attribute computations, determines how information flows across the entire tree, and then generates a parallel PARLANSE program to implement the information flow per tree node type. It can do a data flow analysis on each AG rule to determine the information flow and what has to happen first vs later vs. last. For our simple sum rule above, it should be clear that the attributes of the children have to computed before the attribute for root can be computed. It turns out that PARLANSE's partial order construction are a perfect way to encode this information, and even nicely handles the side-effects we added to AGs.

The result is that DMS compiles an AG specification into a highly parallel PARLANSE program. Our C++ front end name/type resolver is implemented as about 150K lines of DMS AG (yes, it takes that much to describe how C++ type information is used), which compiles to some 700K SLOC of parallel PARLANSE code. And it works (and runs in parallel in x86 multicore machines) without any thought or debugging, and it seems to scale nicely.

Ira Baxter 2010-09-30 09:44:27

ansaurus

tags:

views:

answers:

Lisp is for List Processing. Is there a language for Tree Processing?

related questions