views:

237

answers:

5

The name for Lisp derives from LISt Processing. Linked lists are the major data structure of Lisp languages, and Lisp source code is itself made up of lists. As a result, Lisp programs can manipulate source code as a data structure (this is known as homoiconicity).

However, a list is by definition a sequential construct. This encourages us to solve problems using sequential language idioms (algorithms that process one thing at a time and accumulate the results). For example, in a Lisp where cons cells are used to implement singly-linked lists, the car operation returns the first element of the list, while cdr returns the rest of the list. My vision is of a functional language for parallel execution, that splits problems into roughly equal sub-problems, recursively solves them, and combines the sub-solutions.

Pretty much every programming language's code is already parsed into trees; is there a homoiconic language like Lisp, but with trees as the major data structure? btw, I'd call it Treep, for TREE Processing.

Update: An interesting presentation (PDF) from 2009 by Guy Steele on parallel algorithms & data structures, Organizing Functional Code for Parallel Execution: or, foldl and foldr Considered Slightly Harmful.

+5  A: 

I don't see that the change would be very profound. Lisp certainly doesn't have any problem with letting lists be members of other lists, so it can easily represent trees, and algorithms on trees.

Conversely, every list can be regarded as a tree of a particular share (in various ways).

reinierpost
The profundity of a language that uses the tree as its base data structure is _effective parallelism_. Of course, once you put lists inside your lists, you've got a tree. You could also go the other way and say that Lisp is really about "cons cell processing"...
Treep
I agree ... but I guess "Consp" or "Pairp" isn't as nice a name.
reinierpost
By the way: do have a look at PostScript. It is stack-based (like Forth) and has dictionaries as a basic data structure.
reinierpost
Another interesting language is [Fortress](http://projectfortress.sun.com/Projects/Community), and its "implicit parallelism"...
Treep
A: 

I would say that the major data structure of Lisp languages is the cons cell. One of the things you can easily build with cons cells is a linked list, but that's by no means the only data structure.

A cons cell is a pair of data items, but there's nothing that says a value has to be in the left cell and a pointer in the right cell (as in a linked list). If you allow both cells to contain either values or pointers themselves, it's easy to build binary (or with a bit more work, n-ary) tree structures. Building upon these structures, one can build dictionaries or B-trees or any other data structure you might think of.

Greg Hewgill
Indeed, as I said in a comment to reinierpost, Lisp is really about "cons cell processing".
Treep
+4  A: 

Lisp Lists ARE trees, and Lisp code is a tree, just like any other code.

(+ (* 1 3) (* 4 6))

is a tree:

     +
    / \
   /   \
   *   *
  / \ / \
  1 3 4 6

And it's not just binary trees.

(+ 1 2 3)

   +
  /|\
 / | \
1  2  3

So, perhaps Lisp is your answer as well as your question.

Will Hartung
...and if you go the other way, you can say that Lisp lists are really cons cells; however, my question is more about the design of a language that allows for more idiomatic (parallel) processing of trees. For example, `(+ (* 1 3) (* 4 6))` could evaluate `(* 1 3)` and `(* 4 6)` in parallel, and language-level constructs would provide even more effective parallelism.
Treep
@Treep: executing code in parallel is a different thing from processing tree data in parallel.
Rainer Joswig
A: 

The canonical example of a homoiconic tree processing language is XSLT. But you probably don't want to write (nor read) anything substantial in it.

Jörg W Mittag
+1  A: 

OP appears to be interested in langauges that process trees and have some parallelism support.

Our DMS Software Reengineering Toolkit is a general-purpose program analysis and transformation engine. It parses programmming language source text into trees (OP's first interest), enables analysis and processing of those trees, and regeneration of source text from those trees. DMS is driven by explicit grammars that describe the language being processed, and it has a lot of production-quality langauge definitions available for it.

DMS's tree processing machinery provide support for parallelism at two different levels, one supported directly by DMS's underlying parallel programming langauge, PARLANSE, which is inspired by LISP but isn't nearly so dynamic.

PARLANSE provides for "teams" of parallel activities called "grains" (as in sand on a beach, the idea is that there are lots of grains). Teams may be structured dynamically, by (classically) forking new grains and adding them to a (dynamic) team; this is useful but not spectacularly fast. Teams may be structured statically, including "fork this fixed size set of grains as pure parallel":

(|| a b c)

and "create a set of grains whose execution order is controlled by a specified partial order" (!| ... ). You write the partial order directly in the fork call:

(!| step1 a
    step2 b
    step3 (>> step1 step2) c
    step4 (>> step2) d )

which encodes the fact that action c occurs after (later >> in time )) actions a and b complete. The statically formed teams are precompiled by the PARLANSE compiler into pretty efficient structures in that the grains can be launched and killed very quickly, allowing pretty small grain size (a few hundred instructions). Standard parallelism libraries have lots higher overhead than this.

The basic method for processing trees is pretty conventional: there's a PARLANSE library of facilities for inspecting tree nodes, walking up and down the tree, creating new nodes and splicing them in place. Recursive procedures are often used to visit/update a tree.

The key observation here is that a tree visit which visits some set of children can be coded sequentially, or easily coded as a static parallel team. So it is pretty easy to manually code parallel tree visits, at the cost of writing lots of special cases to parallel-fork for each tree node type. There's the "divide and conquer" that seems of interest to the OP. (You can of course enumerate a nodes' childern, and using a dynamic team to fork grains for each child, but this isn't as fast). This is the first level of parallelism used to process trees in DMS.

The second level of parallelism comes through a DMS-supplied DSL that implements an attribute grammar (AG)s.

AGs are functional languages that decorate the BNF with a set of computations. One can write a simple calculator with an AG in DMS:

   sum = sum + product;
        <<Calculator>>:  sum[0].result = sum[1].result + product.result;

This causes an attribute, "result" to be computed for the root (sum[0]) by combining the result attribute from the first child (sum[1]) and the second child (product.result). So-called synthesized attributes propagates information up the tree from the leaves; inherited attributes propagate information down from parents. Attribute grammars in general and DMS's AG allow mixing these, so information can flow up and down the tree in arbitrary order.

Most AGs are purely functional, e.g., no side effects; DMS's allows side effects which complicates the notation but is pretty useful in practice. A nice example is construction of symbol tables by attribute evaluation; current-scope is passed down the tree, local-declaration blocks create new current scopes and pass them down, and individual declarations store symbol table data into the symbol table entry received from a parent.

DMS's AG compiler analyzes the attribute computations, determines how information flows across the entire tree, and then generates a parallel PARLANSE program to implement the information flow per tree node type. It can do a data flow analysis on each AG rule to determine the information flow and what has to happen first vs later vs. last. For our simple sum rule above, it should be clear that the attributes of the children have to computed before the attribute for root can be computed. It turns out that PARLANSE's partial order construction are a perfect way to encode this information, and even nicely handles the side-effects we added to AGs.

The result is that DMS compiles an AG specification into a highly parallel PARLANSE program. Our C++ front end name/type resolver is implemented as about 150K lines of DMS AG (yes, it takes that much to describe how C++ type information is used), which compiles to some 700K SLOC of parallel PARLANSE code. And it works (and runs in parallel in x86 multicore machines) without any thought or debugging, and it seems to scale nicely.

Ira Baxter