OP appears to be interested in langauges that process trees and have some parallelism support.
Our DMS Software Reengineering Toolkit is a general-purpose program analysis and transformation engine. It parses programmming language source text into trees (OP's first interest), enables analysis and processing of those trees, and regeneration of source text from those trees.
DMS is driven by explicit grammars that describe the language being processed, and it has a lot of production-quality langauge definitions available for it.
DMS's tree processing machinery provide support for parallelism at two different levels, one supported directly by DMS's underlying parallel programming langauge, PARLANSE, which is inspired by LISP but isn't nearly so dynamic.
PARLANSE provides for "teams" of parallel activities called "grains" (as in sand on a beach, the idea is that there are lots of grains). Teams may be structured dynamically, by (classically) forking new grains and adding them to a (dynamic) team; this is useful but not spectacularly fast. Teams may be structured statically, including "fork this fixed size set of grains as pure parallel":
(|| a b c)
and "create a set of grains whose execution order is controlled by a specified partial order" (!| ... ). You write the partial order directly in the fork call:
(!| step1 a
step2 b
step3 (>> step1 step2) c
step4 (>> step2) d )
which encodes the fact that action c occurs after (later >> in time )) actions a and b complete. The statically formed teams are precompiled by the PARLANSE compiler into pretty efficient structures in that the grains can be launched and killed very quickly, allowing pretty small grain size (a few hundred instructions). Standard parallelism libraries have lots higher overhead than this.
The basic method for processing trees is pretty conventional: there's a PARLANSE library of facilities for inspecting tree nodes, walking up and down the tree, creating new nodes and splicing them in place. Recursive procedures are often used to visit/update a tree.
The key observation here is that a tree visit which visits some set of children can be coded sequentially, or easily coded as a static parallel team. So it is pretty easy to manually code parallel tree visits, at the cost of writing lots of special cases to parallel-fork for each tree node type. There's the "divide and conquer" that seems of interest to the OP. (You can of course enumerate a nodes' childern, and using a dynamic team to fork grains for each child, but this isn't as fast). This is the first level of parallelism used to process trees in DMS.
The second level of parallelism comes through a DMS-supplied DSL that implements an attribute grammar (AG)s.
AGs are functional languages that decorate the BNF with a set of computations. One can write a simple calculator with an AG in DMS:
sum = sum + product;
<<Calculator>>: sum[0].result = sum[1].result + product.result;
This causes an attribute, "result" to be computed for the root (sum[0]) by combining the result attribute from the first child (sum[1]) and the second child (product.result).
So-called synthesized attributes propagates information up the tree from the leaves; inherited attributes propagate information down from parents. Attribute grammars in general and DMS's AG allow mixing these, so information can flow up and down the tree in arbitrary order.
Most AGs are purely functional, e.g., no side effects; DMS's allows side effects which complicates the notation but is pretty useful in practice. A nice example is construction of symbol tables by attribute evaluation; current-scope is passed down the tree, local-declaration blocks create new current scopes and pass them down, and individual declarations store symbol table data into the symbol table entry received from a parent.
DMS's AG compiler analyzes the attribute computations, determines how information flows across the entire tree, and then generates a parallel PARLANSE program to implement the information flow per tree node type. It can do a data flow analysis on each AG rule to determine the information flow and what has to happen first vs later vs. last. For our simple sum rule above, it should be clear that the attributes of the children have to computed before the attribute for root can be computed. It turns out that PARLANSE's partial order construction are a perfect way to encode this information, and even nicely handles the side-effects we added to AGs.
The result is that DMS compiles an AG specification into a highly parallel PARLANSE program. Our C++ front end name/type resolver is implemented as about 150K lines of DMS AG (yes, it takes that much to describe how C++ type information is used), which compiles
to some 700K SLOC of parallel PARLANSE code. And it works (and runs in parallel in x86 multicore machines) without any thought or debugging, and it seems to scale nicely.