views:

994

answers:

1

I generated this dendrogram using R's hclust(), as.dendrogram() and plot.dendrogram() functions.

I used the dendrapply() function and a local function to color leaves, which is working fine.

I have results from a statistical test that indicate if a set of nodes (e.g. the cluster of "_+v\_stat5a\_01_" and "_+v\_stat5b\_01_" in the lower-right corner of the tree) are significant or important.

I also have a local function that I can use with dendrapply() that finds the exact node in my dendrogram which contains significant leaves.

I would like to either (following the example):

  1. Color the edges that join "_+v\_stat5a\_01_" and "_+v\_stat5b\_01_"; or,
  2. Draw a rect() around "_+v\_stat5a\_01_" and "_+v\_stat5b\_01_"

I have the following local function (the details of the "nodes-in-leafList-match-nodes-in-clusterList" condition aren't important, but that it highlights significant nodes):

markSignificantClusters <<- function (n) {
  if (!is.leaf(n)) {
     a <- attributes(n)
     leafList <- unlist(dendrapply(n, listLabels))
     for (clusterIndex in 1:length(significantClustersList[[1]])) {
       clusterList <- unlist(significantClustersList[[1]][clusterIndex])
       if (nodes-in-leafList-match-nodes-in-clusterList) {
          # I now have a node "n" that contains significant leaves, and
          # I'd like to use a dendrapply() call to another local function
          # which colors the edges that run down to the leaves; or, draw
          # a rect() around the leaves
       }
     }
  }
}

From within this if block, I have tried calling dendrapply(n, markEdges), but this did not work:

markEdges <<- function (n) {
  a <- attributes(n)
  attr(n, "edgePar") <- c(a$edgePar, list(lty=3, col="red"))
}

In my ideal example, the edges connecting "_+v\_stat5a\_01_" and "_+v\_stat5b\_01_" would be dashed and of a red color.

I have also tried using rect.hclust() within this if block:

ma <- match(leafList, orderedLabels)  
rect.hclust(scoreClusterObj, h = a$height, x = c(min(ma), max(ma)), border = 2)

But the result does not work with horizontal dendrograms (i.e. dendrograms with horizontal labels). Here is an example (note the red stripe in the lower-right corner). Something is not correct about the dimensions of what rect.hclust() generates, and I don't know how it works, to be able to write my own version.

I appreciate any advice for getting edgePar or rect.hclust() to work properly, or to be able to write my own rect.hclust() equivalent.

UPDATE

Since asking this question, I used getAnywhere(rect.hclust()) to get the functional code that calculates parameters and draws the rect object. I wrote a custom version of this function to handle horizontal and vertical leaves, and call it with dendrapply().

However, there is some kind of clipping effect that removes part of the rect. For horizontal leaves (leaves that are drawn on the right side of the tree), the rightmost edge of the rect either disappears or is thinner than the border width of the other three sides of the rect. For vertical leaves (leaves that are drawn on the bottom of the tree), the bottommost edge of the rect suffers the same display problem.

What I had done as a means of marking significant clusters is to reduce the width of the rect such that I render a vertical red stripe between the tips of the cluster edges and the (horizontal) leaf labels.

This eliminates the clipping issue, but introduces another problem, in that the space between the cluster edge tips and the leaf labels is only six or so pixels wide, which I don't have much control over. This limits the width of the vertical stripe.

The worse problem is that the x-coordinate that marks where the vertical stripe can fit between the two elements will change based on the width of the larger tree (par["usr"]), which in turn depends on how the tree hierarchy ends up being structured.

I wrote a "correction" or, better termed, a hack to adjust this x value and the rect width for horizontal trees. It doesn't always work consistently, but for the trees I am making, it seems to keep from getting too close to (or overlapping) edges and labels.

Ultimately, a better fix would be to find out how to draw the rect so that there is no clipping. Or a consistent way to calculate the specific x position in between tree edges and labels for any given tree, so as to center and size the stripe properly.

I would also be very interested in a method for annotating edges with colors or line styles.

+1  A: 

So you've actually asked about five questions (5 +/- 3). As far as writing your own rect.hclust like function, the source is in library/stats/R/identify.hclust.R if you want to look at it.

I took a quick glance at it myself and am not sure it does what I thought it did from reading your description--it seems to be drawing multiple rectangles, Also, the x selector appears to be hard coded to segregate the tags horizontally (which isn't what you want and there's no y).

I'll be back, but in the meantime you might (in addition to looking at the source) try doing multiple rect.hclust with different border= colors and different h= values to see if a failure pattern emerges.

Update

I haven't had much luck poking at this either.

One possible kludge for the clipping would be to pad the labels with trailing spaces and then bring the edge of your rectangle in slightly (the idea being that just bringing the rectangle in would get it out of the clipping zone but overwrite the ends of the labels).

Another idea would be to fill the rectangle with a translucent (low alpha) color, making a shaded area rather than a bounding box.

MarkusQ